Runbook: ODS Dashboard

Runbook: ODS Dashboard

Last updated: 2026-03-30 Service owner: ODS Platform Team Pipeline status: STAGING_VERIFIED (intermittent 503s – see Common Issues)


1. Service Overview

ODS Dashboard is a web portal for monitoring the ADLC pipeline, project status, tests, reviews, and kanban. It reads data from the local filesystem (specs, ops outputs, project dirs) and provides a real-time view of all ODS projects. It is read-only and does not modify any data.

Architecture: Two-tier proxy setup. - A Caddy reverse proxy container runs on srv-staging (Coolify server), terminating TLS and forwarding to the agents server. - The actual dashboard container runs on the agents server (srv-agents) with a bind mount to ~/dev for filesystem data access.

Property Value
Language Node.js 22 / TypeScript (Hono API + React UI)
API port 3100
UI port 3101
Package manager pnpm
Architecture Monorepo (packages/api + packages/ui)
Coolify app UUID a04wo884sgwk04cws48kw8ss

2. Health Check

Endpoint: GET /api/health (API server on port 3100) Expected response: HTTP 200

# From external (via Traefik/Caddy proxy):
curl -sf https://dashboard.staging.orbusdigital.com/api/health

# From agents server directly (bypass proxy):
curl -sf http://localhost:3100/api/health

3. Staging URL

https://dashboard.staging.orbusdigital.com

Authentication: Basic Auth (BASIC_AUTH_USER / BASIC_AUTH_PASS)


4. Environment Variables

Variable Required Description
PORT No API server port (default 3100)
UI_PORT No UI server port (default 3101)
NODE_ENV No production or development
DATABASE_URL No PostgreSQL connection for any dashboard-specific data
BASIC_AUTH_USER Yes Basic auth username
BASIC_AUTH_PASS Yes Basic auth password
DATA_DIR Yes Root directory for filesystem data (e.g., /data in container, maps to ~/dev)
WS_ENABLED No Enable WebSocket for real-time updates (default true)
LOG_LEVEL No Log level (default info)
CORS_ORIGINS No Allowed CORS origins

5. How to Deploy

Dashboard container (on agents server)

The dashboard runs locally on the agents server via docker-compose:

cd /home/jniox_orbusdigital_com/dev/projects/ods-dashboard
docker compose -f docker-compose.prod.yml up -d --build

Caddy proxy (on srv-staging via Coolify)

The Coolify app (a04wo884sgwk04cws48kw8ss) builds from the coolify-proxy branch, which contains a Caddy reverse proxy that routes to 10.204.0.2:3100 (agents server via WireGuard VPN).

To redeploy the proxy:

source ~/.env.adlc 2>/dev/null
curl -sf -X POST "https://app.coolify.io/api/v1/applications/a04wo884sgwk04cws48kw8ss/restart" \
  -H "Authorization: Bearer $COOLIFY_TOKEN"

6. How to Check Logs

Dashboard container (agents server – this machine)

docker logs --tail 200 -f ods-dashboard

Caddy proxy (srv-staging)

# SSH to srv-staging then:
docker logs --tail 200 -f $(docker ps -qf "label=coolify.applicationId=a04wo884sgwk04cws48kw8ss")

7. Common Issues and Fixes

Issue: Intermittent 503 “no available server” (RECURRING)

Symptom: https://dashboard.staging.orbusdigital.com returns 503 or HTTP 000 (connection refused). Other services on the same IP (35.195.54.220) work fine.

Root cause: Traefik on srv-staging loses the route to the Caddy proxy container, or the Caddy container itself restarts. The WireGuard VPN tunnel between srv-staging and srv-agents may also drop temporarily.

Diagnosis:

# Check if dashboard container is healthy locally:
curl -sf http://localhost:3100/api/health

# Check WireGuard tunnel:
ping -c 3 10.204.0.2

# Check Caddy proxy on srv-staging:
ssh srv-staging docker ps | grep dashboard

Fix: 1. If local container is down: docker compose -f ~/dev/projects/ods-dashboard/docker-compose.prod.yml up -d 2. If Caddy proxy is down on srv-staging: restart via Coolify 3. If WireGuard is down: sudo wg show and restart if needed 4. Usually self-recovers within 15-30 minutes

History: This has been a recurring issue since deployment. Observed on 2026-03-29 (17:00), 2026-03-30 (10:15, 21:30), 2026-03-31 (02:00-08:45, 13:15). Already escalated to Slack DM.

Issue: Stale data in dashboard

Symptom: Dashboard shows outdated pipeline state.

Fix: Dashboard reads from filesystem. Verify the bind mount is working:

docker exec ods-dashboard ls /data/ops/outputs/

Issue: WebSocket disconnects

Symptom: Real-time updates stop working, UI shows stale data.

Fix: Check if the API server is running and WS is enabled:

docker exec ods-dashboard env | grep WS_ENABLED

8. How to Restart

Dashboard container (agents server)

docker restart ods-dashboard
# Or full rebuild:
cd /home/jniox_orbusdigital_com/dev/projects/ods-dashboard
docker compose -f docker-compose.prod.yml up -d --build

Caddy proxy (srv-staging)

# Via Coolify API:
source ~/.env.adlc 2>/dev/null
curl -sf -X POST "https://app.coolify.io/api/v1/applications/a04wo884sgwk04cws48kw8ss/restart" \
  -H "Authorization: Bearer $COOLIFY_TOKEN"

9. Dependencies

Dependency Type Details
Filesystem (~/dev) Data source Bind-mounted as /data:ro in container
WireGuard VPN Network Tunnel between srv-agents (10.204.0.2) and srv-staging
Caddy proxy Routing Coolify app on srv-staging, routes HTTPS to agents server
Traefik Ingress Coolify’s Traefik on srv-staging handles TLS termination
PostgreSQL 17 Optional For any dashboard-specific persistence

No upstream service dependencies – dashboard is read-only from filesystem.


10. Monitoring / Alerting