Skip to content
Metrix

Go-live without the downtime

How we deliver fixes to production with rollback rehearsed.

  • Written by Dylan Karaitiana
  • 5 min read · Published 10 Apr 2026
  • Filed under Playbook · playbook, delivery, reliability

Scoped projects · AU-based

On this page

Playbook

Dylan Karaitiana 10 Apr 2026 5 min read

The go-live is the moment the project either earns its fee or doesn’t. We treat it like a controlled experiment, not a deployment. The discipline is in the rehearsal, not the night-of execution.

What “go-live” means in our vocabulary

We use “go-live” deliberately — not “deploy”, not “release”, not “ship”. A go-live is a coordinated switch from one operating state to another, with a known rollback path and a defined verification window. Deploys happen all the time without coordination. Go-lives are scheduled, planned, and observed.

A go-live might be: switching DNS to point at a new application. Migrating customer records into a new CRM. Promoting a new checkout from staging to production for 100% of traffic. Each is a go-live because it’s a state change with consequences if it goes wrong.

The three rules we don’t break

We’ve delivered enough go-lives to have a short list of rules that stay constant across every project. The three rules below are the ones we don’t compromise on.

Rule one — Go live at low traffic for the operator’s actual customer base

Not a blanket “off-hours” rule. Each operator has a different traffic shape. For an AU B2B SaaS we go live Tuesday 14:30 Sydney — mid-afternoon, mid-week, when customers are at their desks but no critical reports are due. For an ecom site we go live 03:00 Sunday Australia time — middle of the customer’s night, off-peak globally. For a public-sector operator we go live Saturday morning — outside the constituent-facing window.

We profile traffic before scheduling. The “off-hours” rule that ships at 02:00 every time is convenient for the agency and bad for the customer. A 02:00 go-live means rollback (if needed) happens to a tired engineer. A daylight go-live means alert humans, fast triage, faster rollback.

Rule two — Rollback rehearsed on a copy of your live site

Not “we have a rollback plan” — actually executed, end-to-end, against a copy of the live site the day before the go-live. We provision the copy, simulate the go-live state, then practice rolling back. We time it. If rollback takes more than four minutes, the go-live plan gets revised before we proceed. Sometimes that means changing the deployment strategy (blue-green vs in-place). Sometimes it means changing the data migration shape (dual-write vs cut-and-fill).

The rehearsal catches things you don’t think to plan for. The certificate that doesn’t propagate fast enough. The DNS TTL that’s set too long. The cached connection pool that doesn’t drain on the rollback. We’ve seen each of these break a “we’ll just roll back” plan in a real go-live where they hadn’t been rehearsed.

Rule three — Phone-call escalation path active during the go-live window

Not Slack. Phone. The named operator running the go-live answers within 30 seconds. If they don’t, the named backup does. Both numbers are in the engagement letter; both phones are on and unsilenced for the go-live window.

Why phone over Slack? Because go-live decisions are time-critical and synchronous. “Should we proceed with the dual-write?” needs an answer in seconds, not in the time it takes someone to switch from email to Slack to read the message. Phone is faster than every async channel.

The go-live document

For every go-live we write a go-live document. Two pages, structured the same way each time.

Page one — Plan

Go-live window with timezone. Pre-flight checklist. Step-by-step sequence with command lines or click-paths. Verification steps with explicit pass/fail criteria. Rollback trigger criteria (what conditions abort the go-live). Rollback procedure step-by-step.

Page two — Roles and channels

Named operator running the go-live. Named backup. Operator-side approval gate (one human on the operator team who has to give a “go” before the go-live proceeds). Phone numbers for both sides. The Slack channel for non-urgent updates during the window. The post-go-live smoke-test plan with specific URLs and expected responses.

The document goes to the operator 48 hours before the go-live window for sign-off. Any changes flagged in sign-off get re-rehearsed against the copy.

What’s not negotiable

A few constraints we don’t relax for any reason.

We don’t go live without rehearsal. If the operator wants to skip rehearsal because “we’re under time pressure”, we postpone the go-live. The pressure doesn’t change the math — an unrehearsed go-live that fails costs more than the delay would have.

We don’t go live without an operator-side approval gate. Even a named operator on our side shouldn’t unilaterally trigger a state change on the operator’s production. There’s always a human on their team who says “go” before we proceed.

We don’t change the go-live window in the last 24 hours. If something comes up that would change the window, we postpone — we don’t compress.

After the go-live

Forty-eight hours of heightened observation. The named operator stays close to the channels and dashboards. We monitor the metric we set as the starting number and the system-level signals (error rates, p95 latency, queue depth depending on the project).

If the metric moves the way we predicted in the go-live plan, we close the go-live. If it doesn’t, the rollback gets considered against a documented decision tree — sometimes the right answer is to hold the new state and triage forward, sometimes the right answer is to roll back and re-run. Either way, the decision is documented.

The takeaway

The go-live playbook is unglamorous because the discipline is in the rehearsal and the documentation, not in the heroics on the night. We’ve never had to do a heroics-on-the-night go-live, because the rehearsal catches what would have been heroics. That’s the whole point.

playbook delivery reliability

Share this article

Related notes

More from the active book.

Move the number this quarter

Name the weakest part.
Move the number.

Most of the mapping happens upfront. Fixed fee. You hand over your mission, your systems, today's numbers — we hand back the biggest gap named in dollars.

Clear direction · Less noise · Results you can see

Metrix

We build the operating system your business runs on — websites, apps, integrations & data flow that turn moving parts into measurable results.

Newsletter

Notes from the active book. Monthly. Operator-grade. No fluff.

© 2026 Metrix. Operated by Metrix Australia 78 652 709 030

  • Security
  • Privacy
  • Terms