Release Process

The CMS follows a pull-based, manually-gated release flow: every push to main ships to staging automatically, and a human triggers production when staging looks good.

Roles

Author — Writes code, opens PR, merges to main. The merge triggers a staging deploy automatically.
Releaser — Decides “staging is healthy enough, ship it.” Triggers the production workflow with a version bump.
Approver — Reviewer on the production GitHub Environment. Approves the workflow run before it touches prod. (Often the same person as the Releaser, but the two-person rule is what the Environment enforces if you configure multiple required reviewers.)

The flow

Merge to main. ci.yml runs build/test/lint and, on success, chains the deploy-staging job that ships to staging. Watch the Actions tab; check https://cms-staging.alokai-apps.com once it goes green.
Verify on staging. Run through the pre-deploy checklist.
Trigger production. Actions → Deploy to Production → Run workflow → choose version_bump:
- patch (default) — bug fixes, doc/UI tweaks, no schema changes.
- minor — new feature, additive schema, additive API.
- major — breaking schema or API change, or anything that requires a coordinated storefront update.
Approve. A required reviewer on the production GitHub Environment approves the run. The job stays paused until they do.
Workflow runs. It tests, bumps the version in apps/cms/package.json, applies any pending D1 migrations, deploys MCP, deploys CMS, smoke-checks /api/health, and only then commits the version bump, tags cms-vX.Y.Z, pushes, and creates a GitHub Release with auto-generated notes.
Verify. Confirm https://cms.alokai-apps.com/api/health returns the new version, the dashboard’s sidebar badge matches, and the Release shows up under GitHub Releases.

When a deploy fails

Failure	What happened	What to do
Test step	A unit test regressed on `main`.	Don’t bypass. Fix on `main`, let staging redeploy, retry the prod run.
Bump CMS version	Should not happen — uses a direct JSON rewrite. If it ever does, fix the inline node script and retry.	Investigate the failing log; the workflow stopped before any deploy work.
Backfill `d1_migrations` if needed	First-time run on a runtime-migrated DB; the auto-detect+backfill probe failed.	Run the manual backfill (`yarn workspace cms db:migrate:backfill:prod`) from a wrangler-authed shell, then re-trigger the workflow.
Apply D1 migrations	New migration is invalid or conflicts with prod schema.	Workflow stopped before deploying. Fix the migration on `main`, let staging redeploy and verify, then retry prod.
Deploy MCP worker	MCP build broken or wrangler config mismatch.	Workflow stopped before CMS was redeployed; prod is still on the previous CMS+MCP. Fix and retry.
Deploy CMS worker	New CMS bundle fails to deploy; old bundle still serves traffic.	Investigate; if MCP already deployed and is back-incompatible with the old CMS, roll MCP back via `wrangler rollback` (see Deployment).
Smoke check	Deploy went out but `/api/health` failed or returned the wrong version.	The new bundle is live and bad. Roll back via `wrangler rollback`. The version bump commit was not pushed, so re-running picks up cleanly after a fix.

Versioning policy

Patch (X.Y.Z+1) — default for everything that isn’t a feature or a breaking change. Hotfixes, doc-only changes, refactors, internal cleanups.
Minor (X.Y+1.0) — new feature you want users to notice, or any change that adds tables / columns / API endpoints (additive only).
Major (X+1.0.0) — coordinate-with-storefront changes: API contract breaks, removed columns, removed routes, auth-flow changes.

The version is always the version of apps/cms (the dashboard worker). @alokai/cms-api and @alokai/cms-cli have their own publish workflows (publish-api.yml, publish-cli.yml) — those are independent.

Hotfix flow

If prod breaks and a fix needs to ship before the staging cadence:

Branch from the last released tag (cms-vX.Y.Z), not main.
Apply the minimum fix.
Open a PR straight to main and merge once green.
Trigger Deploy to Production with version_bump=patch.

This skips the “soak in staging” step deliberately. Reserve it for actual incidents. After the hotfix lands, the next normal main push will re-roll staging onto the new tip, picking up the hotfix in the staging deploy too.