SOP-009: Staging to Production Promotion

SOP-009: Staging to Production Promotion

Purpose

Promote a service from staging to production after all validation gates have passed. This is a human-approved process – no autonomous production deployment is permitted.

Scope

Applies to all services that have reached STAGING_VERIFIED state in the pipeline. Covers pre-promotion checklist, human approval request, production deployment, and post-deploy validation.

Prerequisites

Procedure

1. Verify staging readiness

SERVICE="{service}"
PROJECT="{project}"

# Check pipeline state
grep "$SERVICE" ~/.claude/agent-memory/pipeline/state.md

# Verify all reviews
for report in ba-report.json architect-report.json security-report.json devops-report.json; do
  echo "--- $report ---"
  python3 -c "
import json
r = json.load(open('$HOME/dev/ops/reviews/$SERVICE/$report'))
print(json.dumps({k: r[k] for k in ['status','verdict','severity'] if k in r}, indent=2))
" 2>/dev/null
done

# Verify E2E
python3 -c "
import json
r = json.load(open('$HOME/dev/ops/reviews/$SERVICE/e2e-report.json'))
print(f'E2E verdict: {r[\"verdict\"]}')
" 2>/dev/null

2. Check for open security findings

python3 -c "
import json
r = json.load(open('$HOME/dev/ops/reviews/$SERVICE/security-report.json'))
sev = r.get('severity', 'unknown')
if sev in ('HIGH', 'CRITICAL'):
    print(f'BLOCKED: Security severity {sev} -- cannot promote')
else:
    print(f'Security: {r[\"status\"]} ({sev}) -- OK to promote')
"

Check for the known CVE:

echo "CVE-2026-2005 (PostgreSQL pgcrypto HIGH): still open -- evaluate risk acceptance"

3. Request human approval

Generate a human review report:

cat << EOF | bash ~/dev/ops/adlc-v2/scripts/cli/write-human-review.sh $SERVICE deployment
{
  "title": "$SERVICE staging to production promotion",
  "summary": "$SERVICE has passed all review gates and E2E tests on staging. Ready for production deployment.",
  "timeline": [
    {"date": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", "event": "Promotion request generated", "agent": "orchestrator"}
  ],
  "blocker": {
    "description": "Production deployment requires human approval",
    "rootCause": "Company policy: no autonomous production deployments",
    "impact": "$SERVICE not available for production tenants until promoted",
    "severity": "HIGH"
  },
  "options": [
    {"label": "Promote to production", "description": "Deploy current staging build to production", "slackReply": "approved promote $SERVICE"},
    {"label": "Hold", "description": "Keep on staging for more validation", "slackReply": "hold $SERVICE"},
    {"label": "Reject", "description": "Issues found, needs more work", "slackReply": "rejected $SERVICE -- reason"}
  ]
}
EOF

This automatically: - Generates HTML report - Uploads to Google Drive - Posts to Slack DM with link and reply options

4. Wait for human response

Do NOT proceed without explicit approval. The human will respond via Slack with one of: - approved promote {service} – proceed to step 5 - hold {service} – keep monitoring staging, check back later - rejected {service} -- {reason} – create fix tasks and re-enter pipeline

5. Production deployment (after approval)

Production deployment details depend on the target infrastructure:

For Coolify-managed services:

source ~/.env.adlc
UUID=$(python3 -c "import json; print(json.load(open('$HOME/dev/ops/coolify/$SERVICE.json'))['coolify']['app_uuid'])")

# Update git branch to main (production)
curl -sf -X PATCH "$COOLIFY_API_URL/api/v1/applications/$UUID" \
  -H "Authorization: Bearer $COOLIFY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"git_branch": "main"}'

# Merge staging to main
cd ~/dev/projects/$SERVICE
git fetch origin
git checkout main
git merge origin/staging --no-edit
git push origin main

# Trigger rebuild
curl -sf -X POST "$COOLIFY_API_URL/api/v1/applications/$UUID/restart" \
  -H "Authorization: Bearer $COOLIFY_API_TOKEN"

For other deployment targets: Follow the specific deployment guide for the target platform.

6. Post-deployment health check

PROD_URL="https://{service}.orbusdigital.com"
HEALTH="/health"

MAX_POLLS=36
for i in $(seq 1 $MAX_POLLS); do
  code=$(curl -sf -o /dev/null -w "%{http_code}" "${PROD_URL}${HEALTH}" 2>/dev/null || echo "000")
  echo "Poll $i/$MAX_POLLS: HTTP $code"
  [ "$code" = "200" ] && echo "PRODUCTION HEALTHY" && break
  sleep 10
done

7. Run production smoke tests

Minimal E2E against production: - Health check returns 200 - Authentication flow works (get token, call protected endpoint) - Multi-tenancy isolation (cannot access another tenant’s data)

8. Update pipeline state

CLI="$HOME/dev/ops/adlc-v2/scripts/cli"
bash $CLI/write-pipeline-state.sh $PROJECT $SERVICE PROD_DEPLOYED "https://{service}.orbusdigital.com"
bash $CLI/write-status.sh $SERVICE deploy DONE "Production deployment complete"

9. Post milestone to Slack

source ~/.env.adlc
curl -sf -X POST "https://slack.com/api/chat.postMessage" \
  -H "Authorization: Bearer $SLACK_BOT_TOKEN" \
  -H "Content-Type: application/json" \
  -d "$(python3 -c "import json; print(json.dumps({'channel':'C0AN0N8AUGZ','text':':rocket: $SERVICE deployed to PRODUCTION -- https://{service}.orbusdigital.com'}))")"

Verification

Rollback

If production deployment fails or introduces issues:

  1. Immediate rollback (within 15 minutes):
cd ~/dev/projects/$SERVICE
git checkout main
git revert HEAD --no-edit
git push origin main

# Trigger rebuild with reverted code
curl -sf -X POST "$COOLIFY_API_URL/api/v1/applications/$UUID/restart" \
  -H "Authorization: Bearer $COOLIFY_API_TOKEN"
  1. Post-rollback:

References