On this page
Insights
Dylan Karaitiana 22 Apr 2026 4 min read Most agency SLAs are written for the agency, not the operator. “Response within 4 business hours” sounds reasonable until something is actually on fire and the response is “we’ve raised this with the team.” The SLA met. The metric moved. Nothing changed.
What an SLA should actually guarantee
The thing an operator cares about isn’t response time — it’s resolution time, and specifically it’s “who is on this.” Resolution time depends on whether the right human is looking at the right log file. A 4-hour SLA against a ticket queue is just a 4-hour delay before the queue gets triaged. The actual fix happens after that.
We replaced ours with a much simpler guarantee: a named operator owns each pillar, their phone is in the engagement letter, and they answer it. If the named operator can’t answer, the named backup answers. There’s no triage step because there’s nothing to triage — the operator on the call is the operator who’ll fix it.
What this looks like operationally
For each project we name three roles in the engagement letter. The named operator (the person who scoped and delivers the work). The named backup (who covers leave, after-hours, and holiday). And the named escalation contact at Metrix (who hears about it if something goes systemic). All three are in your phone before the project starts.
Channel choice
Slack channels work for non-urgent work. We set one up per project, the operator’s team and ours both in it, lightly moderated. Most communication happens there.
For escalation we don’t use Slack. Phone first, SMS as backup, email last. Escalation means something’s actively breaking — Slack is asynchronous and we don’t want the operator wondering whether the message was seen.
After-hours coverage
For Scale-plan projects (the embedded one), the named operator carries an after-hours phone. For Starter and Growth, after-hours coverage is opt-in and priced separately because we don’t want to pretend we offer 24/7 if we’re not staffed for it.
We’ve found most operators don’t actually need after-hours for the pillars we work. Go-lives run during low-traffic windows we choose deliberately, and the rollback plan is rehearsed for the go-live specifically — meaning if something goes wrong it goes wrong while the named operator is sitting at the desk. After-hours pages are rare and usually about the foundation part of the operating system (hosting, DNS, certificates) which is where we recommend the Scale plan.
Why the queue is the tax
The queue is invisible until something goes wrong. Then it surfaces all at once. The brief enters the queue. Triage assigns priority. The wrong engineer gets it because triage doesn’t know which part is actually underperforming. They escalate, which puts it back in the queue. By the time the right engineer is looking at the right log file, it’s day two of an outage that should have been a 90-minute fix.
Removing the queue requires a smaller team. We can’t do this with 200 engineers — we’d lose the named-operator property. So we don’t try. We size capacity to the named-operator count, and when we’re full we close the project window. This shows up on the homepage as “Q3 capacity opens August” or similar.
The trade-off
Named escalation costs us scale. We can’t take every project that asks; we close windows when capacity is full. For the operator it costs more per pillar than offshore-queue-based agencies. The trade is fewer points of failure, faster resolution when something breaks, and continuity over multiple quarters.
For operators in the AU $500K–$10M revenue band where a single bad go-live costs more than the price difference, the trade works. For operators with very routine, very predictable pillar work — a queue-based model often delivers fine.
How to test it before signing
If you’re evaluating an agency on response, the test is simple: ask for the phone numbers of the people who’d actually do the work, before signing. If the answer is “you’ll get the team’s main number” or “it’ll come through the project manager”, the SLA they’re quoting you is queue-based. There’s nothing wrong with that, but you should know what you’re buying.
We publish ours in the engagement letter on day one. It’s the simplest commitment we make and the one we get asked about most.