ODS Platform -- System Reference

ODS Platform – System Reference

Version: 1.0 Generated: 2026-03-22 Scope: Complete reference for ADLC + PDLC autonomous pipelines


Table of Contents

  1. System Architecture
  2. ADLC Pipeline Flow
  3. PDLC Pipeline Flow
  4. Agent Registry
  5. Status File Format Standard
  6. Dispatcher State Machine
  7. Service Classification
  8. External Dependencies
  9. Resolver Protocol
  10. Slack Channels and Communication
  11. Known Issues and Mitigations
  12. File System Map

1. System Architecture

Overview

The ODS Platform runs two autonomous pipelines on a single server, each managed by a Claude Code session inside a tmux window, driven by bash-based dispatchers running on systemd timers.

                     +-----------------------------------+
                     |           Linux Server            |
                     |  /home/jniox_orbusdigital_com/    |
                     +-----------------------------------+
                              |              |
              +---------------+              +---------------+
              |                                              |
   +----------v-----------+                   +--------------v---------+
   |   ADLC Pipeline      |                   |   PDLC Pipeline        |
   |   (Development)      |                   |   (Product)            |
   +----------+------------+                  +--------------+----------+
              |                                              |
   +----------v-----------+                   +--------------v---------+
   | tmux: ods-claude      |                  | tmux: ods-pdlc          |
   | Claude supervisor     |                  | Claude supervisor       |
   | CLAUDE.md rules       |                  | CLAUDE.md rules         |
   +----------+------------+                  +--------------+----------+
              |                                              |
   +----------v-----------+                   +--------------v---------+
   | dispatcher-v3.sh      |                  | dispatcher-pdlc.sh      |
   | systemd timer: 5 min  |                  | systemd timer: 10 min   |
   | deterministic bash    |                  | deterministic bash      |
   +----------+------------+                  +--------------+----------+
              |                                              |
   +----------v-----------+                   +--------------v---------+
   | session-health.sh     |  <--- shared ---> |  session-health.sh     |
   | systemd timer: 1 min  |                   |  systemd timer: 1 min  |
   +-----------------------+                   +------------------------+

Two Pipelines

Pipeline Purpose tmux Session Dispatcher Interval CLAUDE.md
ADLC Autonomous Development Lifecycle ods-claude dispatcher-v3.sh 5 min ~/dev/CLAUDE.md
PDLC Product Development Lifecycle ods-pdlc dispatcher-pdlc.sh 10 min PDLC CLAUDE.md

Bash Dispatchers

dispatcher-v3.sh (ADLC, every 5 minutes): - Consolidates feature branches into dev - Reads status files for every service in ~/dev/projects/ - Runs deterministic state transitions (tests, status recovery) - Injects complex decisions into Claude via tmux send-keys - Processes Slack inbox messages - Detects external blockers (missing .env vars, missing Coolify configs) - Spawns resolver for systemic issues (2+ services blocked on same cause) - Posts kanban summary every 6th run (30 min)

dispatcher-pdlc.sh (PDLC, every 10 minutes): - Processes PDLC Slack inbox messages - Checks ADLC state for GTM/Analytics triggers (services on staging) - Pokes PDLC Claude if idle (no agents running, waiting at prompt) - Posts PDLC kanban summary every 6th run (60 min)

Session Health (session-health.sh, every 1 minute)

Monitors the ods-claude tmux session: 1. Session alive check: if tmux session missing, creates new one and boots Claude 2. Idle detection: hashes pane content; if unchanged for 15 consecutive minutes with no agents running, kills and restarts session 3. Memory check: if available memory < 512MB, kills largest non-supervisor Claude process 4. Session age check: if session is > 12 hours old AND > 200 interactions, restarts when no agents are running

Slack Bridges

Bridge Script Inbox Directory Channel Monitored
ADLC slack-bridge.sh ~/dev/ops/slack-inbox/ C0AN0N8AUGZ (ADLC), D0AGRAVEC1K (DM)
PDLC pdlc-slack-bridge.sh ~/dev/ops/pdlc-slack-inbox/ C0AN42N3C0L (PDLC)

Bridges poll Slack for new messages, write them as files to the inbox directory. Dispatchers pick them up and inject into the appropriate Claude session.

Daily Restart

At 4:00 AM UTC, a systemd timer triggers a full restart: - Kills both tmux sessions - Starts fresh sessions - Claude reads CLAUDE.md and runs /boot (ADLC) or /boot-pdlc (PDLC) - /boot reconstructs state from: agent-memory files, progress.md per project, git state per service, interrupted RUNNING status files, system resources - Starts pipeline loop


2. ADLC Pipeline Flow

Complete Pipeline Sequence

Dev (code) --> Tests (bash) --> BA (review) --> Architect+Security+DevOps (parallel reviews)
    --> PR (create+merge) --> Provisioner (if needed) --> Deploy (staging)
    --> Scenario (generate tests) --> E2E (execute tests)

Step-by-Step Detail

Step 1: Development

Trigger: Dispatcher detects pending tasks in progress.md with no dev.status file, or Claude receives /dev-task command.

Agent: dev (model: opus, maxTurns: 50)

Execution: 1. Dev agent reads spec at ~/dev/specs/$PROJECT/specs/$SERVICE/spec.md 2. Creates branch feat/$TASK_ID from dev 3. TDD cycle: write failing test, implement, refactor, repeat 4. Commits with conventional format: feat($SERVICE): description [$TASK_ID] 5. Pushes feature branch

Output files: - Git commits on feat/$TASK_ID branch - Update to ~/dev/specs/$PROJECT/gestion/progress.md: - [x] DEV: $TASK_ID -- $DESCRIPTION ($DATE) - Update to ~/.claude/agent-memory/pipeline/state.md: $PROJECT/$TASK_ID: DEV_COMPLETE @ ISO-timestamp

Status file written: None directly. The dispatcher’s branch consolidation step merges feature branches into dev, then sets $SERVICE-dev.status.

On success: Branch exists with commits. Dispatcher proceeds to tests. On failure: If tests fail 3 times on same issue, agent stops and reports blocker. Circuit breaker increments crash counter.


Step 2: Branch Consolidation (Dispatcher, automatic)

Trigger: Every dispatcher run (5 min), before pipeline scan.

Executor: dispatcher-v3.sh consolidate_branches() function (pure bash).

Execution: 1. For each service in ~/dev/projects/*/: - Auto-commit any uncommitted changes on feature branches: git add -A && git commit -m "wip: auto-commit $branch" - Switch to dev branch - Merge each feat/* branch with --no-edit - Delete merged feature branches - Push dev to origin 2. If merge conflict: abort merge, log conflict, post to Slack DM

Output: Consolidated dev branch with all feature work merged.


Step 3: Tests (Bash)

Trigger: $SERVICE-dev.status = DONE AND no $SERVICE-test.status file exists.

Executor: test-runner.sh (pure bash, zero Claude tokens).

Execution:

bash ~/dev/ops/adlc-v2/scripts/test-runner.sh $PROJECT $SERVICE

Status file written: - Success: echo "PASS" > ~/dev/ops/outputs/$SERVICE-test.status - Failure: echo "FAIL" > ~/dev/ops/outputs/$SERVICE-test.status

On success: Dispatcher proceeds to BA review. On failure: Crash counter incremented. If < 3 crashes, may retry. If >= 3, circuit breaker triggers.


Step 4: BA Review

Trigger: $SERVICE-test.status = PASS AND no $SERVICE-ba.status file exists.

Agent: ba (model: sonnet, maxTurns: 35)

Dispatcher action: Sets $SERVICE-ba.status to RUNNING, then injects into Claude:

Spawn /agent ba for $SERVICE. PROJECT: $PROJECT. Spec: ~/dev/specs/$PROJECT/specs/$SERVICE/spec.md

Execution: 1. BA agent reads FULL spec, extracts every acceptance criterion (AC-001, AC-002, …) 2. Reads the code, checks last-reviewed-commit.txt for incremental diff 3. For each criterion: finds implementing code, records file+line as evidence 4. Evaluates each: MET, PARTIAL, MISSING, DEVIATION, N/A 5. For app services: verifies API contracts, database schema, events match spec exactly 6. For infra services: verifies library API, deployment config, topic definitions

Output files: - ~/dev/ops/reviews/$SERVICE/ba-report.json (JSON with criteria, deviations, verdict) - ~/.claude/agent-memory/pipeline/state.md: $PROJECT/$SERVICE: BA_PASS|BA_FAIL @ timestamp

Status file format:

DONE | 2026-03-22T10:30:00+00:00 | ba | $SERVICE | from-json-report

or

FAILED | 2026-03-22T10:30:00+00:00 | ba | $SERVICE | from-json: non-compliant

Verdict rules: - compliant: criteriaMissing == 0 AND criteriaPartial == 0 AND no HIGH/CRITICAL deviations - non-compliant: criteriaMissing > 0 OR any HIGH/CRITICAL deviation

On success (compliant): Dispatcher proceeds to parallel reviews. On failure (non-compliant): - Dispatcher sets $SERVICE-ba.status to TRIAGING and injects into Claude for analysis - Claude reads the BA report JSON to determine if missing criteria map to pending tasks (not a real failure) or to completed tasks (real failure requiring dev fix)

Anti-rubber-stamp rules: - Every criterion MUST have evidence (file path + line number) - If implementing code cannot be found, mark MISSING (never assume) - Read FULL spec every time (never rely on memory) - Never suggest code changes (only report deviations) - Minimum review depth: read every controller/route file and every test file


Step 5: Architect + Security + DevOps Reviews (Parallel)

Trigger: $SERVICE-ba.status = DONE AND no $SERVICE-review.status file exists.

Dispatcher action: Sets $SERVICE-review.status to RUNNING, then injects into Claude:

Spawn /agent architect + /agent security + /agent devops for $SERVICE

All three run in parallel (if memory > 2000MB).

5a. Architect Review

Agent: architect (model: sonnet, maxTurns: 30)

8 mandatory checks (ALL must pass): 1. Schema Isolation – service uses own DB schema only 2. Inter-Service Communication – Redpanda events only, no direct HTTP between ODS services 3. Multi-Tenancy – tenant_id from JWT, RLS on all tables, tenant_id in all CloudEvents 4. Layer Structure – Controllers > Services > Repositories separation 5. No Hardcoded URLs – all external endpoints via env vars 6. Header Propagation – Authorization, X-Tenant-Id, X-Correlation-Id, X-Source-Service forwarded 7. CloudEvents Compliance – specversion, type, source, id, time, tenantid fields present 8. Error Handling – service-specific exception filters, correlation ID in logs, no stack traces in responses

Output: ~/dev/ops/reviews/$SERVICE/architect-report.json Verdict: Single FAIL on any check = overall FAIL. Every check needs evidence.

5b. Security Review

Agent: security (model: sonnet, maxTurns: 30)

OWASP Top 10 checks: - A01 Injection, A02 Broken Auth, A03 Sensitive Data, A04 XXE, A05 Access Control - A06 Security Misconfiguration, A07 XSS, A08 Insecure Deserialization - A09 Known Vulnerabilities (npm audit), A10 Insufficient Logging

Automated scans run first: npm audit, secrets scan (grep), hardcoded URLs scan, .gitignore check.

Output: ~/dev/ops/reviews/$SERVICE/security-report.json Verdict: - clean: all checks PASS/N/A, no npm audit critical/high, no secrets - concerns: any WARN, medium/low npm advisories - critical: any FAIL, critical/high npm advisory, any secret in code

CRITICAL or HIGH severity = automatic FAIL, PR must NOT merge.

5c. DevOps Review

Agent: devops (model: sonnet, maxTurns: 30, mode: review)

Mandatory validation (all executed): 1. Docker build test – docker build -t test-$SERVICE . 2. .dockerignore check 3. Health endpoint verification (grep for health/readiness/liveness) 4. Structured logging check 5. Env vars documentation (.env.example) 6. Migrations check 7. Test suite execution

Output: ~/dev/ops/reviews/$SERVICE/devops-report.json Verdict: PASS, FAIL, or PASS_WITH_NOTES

After all three reviews complete: Dispatcher checks $SERVICE-review.status. If the three JSON report files all exist, sets to DONE. Then validates verdicts:

BA=compliant AND ARCH=PASS AND SEC=(clean|concerns with severity!=HIGH/CRITICAL) AND DEVOPS=(PASS|PASS_WITH_NOTES)

If all pass: proceed to PR. If any fails: set $SERVICE-review.status to FAILED, increment crashes.


Step 6: PR Creation and Merge

Trigger: $SERVICE-review.status = DONE AND all_reviews_pass() returns true AND no $SERVICE-pr.status file exists.

Agent: pr (model: sonnet, maxTurns: 15)

Execution: 1. Reads all four JSON review reports programmatically (never trusts markdown) 2. Validates: BA=compliant, ARCH=PASS, SEC not critical, DEVOPS=PASS/PASS_WITH_NOTES 3. If ANY report MISSING or FAIL: ABORT, do not create PR

Auto-merge safeguards (ALL must pass for auto-merge): - All reviews PASS (not just PASS_WITH_NOTES) - Lines changed <= 500 - No migration files in diff - No cross-service/shared lib changes - Security severity = NONE

If all pass: gh pr merge --squash --auto If any fails: create PR with label human-review-required, post to Slack DM, do NOT merge.

Output: - GitHub PR on staging branch - Status update to ~/.claude/agent-memory/pipeline/state.md: $PROJECT/$SERVICE: STAGING_DEPLOYED @ timestamp - Update to ~/dev/specs/$PROJECT/gestion/progress.md

Status file: $SERVICE-pr.status = DONE | date | pr | $SERVICE | PR merged

Abort conditions: Any review MISSING/FAIL, CI fails, merge conflicts, security HIGH/CRITICAL.


Step 7: Provisioner (if needed)

Trigger: $SERVICE-pr.status = DONE AND no $SERVICE-deploy.status AND no Coolify config file (~/dev/ops/coolify/$SERVICE.json).

Agent: provisioner (model: sonnet, maxTurns: 30)

Execution (based on service type from service-project-map.json):

Service Type Action
web-service Create Coolify Application (Dockerfile build pack)
infrastructure Create Coolify Service (docker-compose)
script No Coolify app. Mark DONE immediately.

Resources provisioned: - PostgreSQL schema + user (psql) - GCS bucket + service account (gcloud) - Coolify app (Coolify API) - GitHub repo (gh) - .env.staging file

Post-provision check (MANDATORY before marking DONE): - web-service: trigger initial deploy, health check (2 min timeout) - infrastructure: check containers running - script: verify project builds

Output: - ~/dev/ops/coolify/$SERVICE.json (Coolify config with appUuid, URLs, registry) - ~/dev/ops/reviews/$SERVICE/provisioner-report.json - ~/dev/projects/$SERVICE/.env.staging

Status file:

DONE | date | provisioner | $SERVICE | verified-operational

or

PROVISION_INCOMPLETE

On failure: PROVISION_INCOMPLETE or BLOCKED, escalate to Slack DM.

What requires human: COOLIFY_API_TOKEN, external API credentials (Stripe, SendGrid, CinetPay), DNS wildcard, SMTP credentials.

What does NOT require human: Coolify app creation, PostgreSQL schema, GCS bucket, GitHub repo, env vars, TLS certs.


Step 8: Deploy to Staging

Trigger: $SERVICE-pr.status = DONE AND Coolify config exists AND no $SERVICE-deploy.status.

Agent: devops (model: sonnet, maxTurns: 30, mode: deploy)

Execution: 1. Build and tag Docker image 2. Push to registry (if configured) 3. Deploy to Coolify via API (restart application) 4. Health check loop (every 10s, up to 3 min) 5. Smoke test 6. Write deploy report

If health fails: rollback via Coolify API, notify Slack DM.

Output: - ~/dev/ops/reviews/$SERVICE/deploy-report.json - ~/dev/projects/$SERVICE/.env.staging

Status file:

DONE | date | deploy | $SERVICE | script-type: no permanent deploy

or

DEPLOYED | date | devops | $SERVICE | $STAGING_URL

Step 9: Scenario Generation

Trigger: Deploy health check passes (sequential, must wait for deploy).

Agent: scenario (model: sonnet, maxTurns: 25)

Execution: 1. Reads spec and all review files 2. Generates API scenarios (scenarios.json) 3. Generates browser scenarios (browser-scenarios.json) if service has UI 4. Writes mock data SQL and cleanup SQL

Mandatory API scenario categories: happy-path, multi-tenancy, auth, validation, error Mandatory browser scenario categories (if UI): user-journey, multi-tenancy, responsive, error-display, navigation

Output: - ~/dev/projects/$SERVICE/tests/e2e/scenarios.json - ~/dev/projects/$SERVICE/tests/e2e/browser-scenarios.json - ~/dev/projects/$SERVICE/tests/e2e/mock-data.sql - ~/dev/projects/$SERVICE/tests/e2e/cleanup.sql

Status: $PROJECT/$SERVICE: SCENARIOS_READY @ timestamp


Step 10: E2E Tests

Trigger: Scenario agent completes (sequential after scenarios).

Agent: e2e-test (model: sonnet, maxTurns: 30)

Tools: Full Playwright MCP tool suite + Read, Write, Bash, Glob, Grep

Pre-check: Verify staging URL is reachable via curl $STAGING_URL/health. If not: ABORT with E2E_BLOCKED.

Phase 1 – API E2E Tests: Execute each scenario from scenarios.json using curl against staging URL. Test categories: happy path, auth (expired/wrong/no token), multi-tenancy (cross-tenant access blocked), input validation, error handling.

Phase 2 – Browser E2E Tests: Execute browser scenarios using Playwright MCP tools: browser_navigate, browser_fill_form, browser_click, browser_wait_for, browser_snapshot, browser_take_screenshot. Save screenshots to ~/dev/ops/reviews/$SERVICE/screenshots/.

Phase 3 – Cleanup: Run cleanup.sql against test database.

Output: - ~/dev/ops/reviews/$SERVICE/e2e-report.json - Screenshots in ~/dev/ops/reviews/$SERVICE/screenshots/

Verdict rules: - All API + browser tests pass: E2E_PASS - Any test fails: E2E_FAIL - Staging not reachable: E2E_BLOCKED - Console JS errors during browser tests: E2E_FAIL (even if assertions pass)


3. PDLC Pipeline Flow

Complete Pipeline Sequence

Discovery --> Spec Writing --> Prioritization --> Validation --> ADLC Handoff
    --> (wait for ADLC) --> GTM Prep --> Analytics

Pipeline States

DISCOVERY --> SPEC_WRITING --> PRIORITIZED --> VALIDATING --> VALIDATED
  --> SUBMITTED_TO_ADLC (passive wait) --> ADLC_ACTIVE (passive wait)
  --> ADLC_STAGING --> GTM --> ANALYTICS

Step-by-Step Detail

Step 1: Discovery

Trigger: New market signal, user feedback, stakeholder request, or Slack command discover {topic}.

Agent: discovery (model: sonnet, maxTurns: 25)

Tools: Read, Bash, Glob, Grep, WebSearch, WebFetch

Execution: 1. Understand the request 2. Research: WebSearch for market data, competitor analysis, industry trends 3. Analyze existing context: read PROJECT.md, existing specs, business-rules.md 4. Identify gaps in platform capabilities 5. Write opportunity brief

Output: - ~/dev/specs/$PROJECT/pdlc/opportunities/$OPPORTUNITY_NAME.md (markdown brief with problem statement, market evidence, proposed solution, target users, business impact, risks, recommendation) - ~/dev/specs/$PROJECT/pdlc/opportunities/$OPPORTUNITY_NAME.json (structured summary)

Recommendation values: GO, NO_GO, NEEDS_MORE_DATA Priority values: CRITICAL, HIGH, MEDIUM, LOW

Status file: echo "RUNNING | $(date) | discovery | $SERVICE" > ~/dev/ops/outputs/$SERVICE-discovery.status

On success: Proceed to spec writing (if recommendation=GO).

Slack: Posts to PDLC channel (C0AN42N3C0L).


Step 2: Spec Writing

Trigger: Discovery recommendation = GO AND no spec exists for target service.

Agent: spec-writer (model: opus, maxTurns: 40)

Execution: 1. Read opportunity brief 2. Read architecture.md and business-rules.md for constraints 3. Read 1-2 existing specs for format and depth reference 4. Write the spec

Spec structure (11 mandatory sections): 1. Objectif 2. API Endpoints (method, path, request/response types, status codes, auth, multi-tenancy) 3. Data Model (tables, columns, types, indexes, RLS, relationships) 4. Events/CloudEvents (types, payload schema, topics) 5. Business Rules (validation, edge cases, error handling) 6. Acceptance Criteria (numbered AC-001..AC-N, Given/When/Then, minimum 10) 7. Non-Functional Requirements (performance, security, multi-tenancy) 8. Dependencies 9. Service Classification (MANDATORY – type, stack, deploy mode, staging domain) 10. Infrastructure Requirements (MANDATORY – resource table with auto-provisionable flag) 11. Out of Scope

Output: - ~/dev/specs/$PROJECT/specs/$SERVICE/spec.md - Updates ~/dev/ops/agents/service-project-map.json with new entry - Updates ~/dev/specs/$PROJECT/gestion/backlog.md

Quality rules: Every endpoint must have types. Every table must have tenant_id + RLS. Every AC must be testable. Minimum 10 ACs. Section 9 and 10 are mandatory.

Status file: echo "RUNNING | $(date) | spec-writer | $SERVICE" > ~/dev/ops/outputs/$SERVICE-spec-writer.status

Slack: Posts to PDLC channel.


Step 3: Prioritization

Trigger: New spec written, not yet prioritized.

Agent: prioritization (model: sonnet, maxTurns: 20)

RICE Scoring: - Reach (1-10): users/tenants affected per quarter - Impact (0.25, 0.5, 1, 2, 3): Minimal to Massive - Confidence (0.5, 0.8, 1.0): Low, Medium, High - Effort (1-10): agent-weeks - Score = (Reach x Impact x Confidence) / Effort

Output: - Updates ~/dev/specs/$PROJECT/gestion/backlog.md (ranked table with RICE scores, phase assignments P0-P3) - ~/dev/specs/$PROJECT/pdlc/prioritization-$DATE.json

Phase groupings: P0 (foundation, blocks everything), P1 (core, enabled by P0), P2 (growth), P3 (nice-to-have)

Slack: Posts top 3 items and current phase count to PDLC channel.


Step 4: Validation

Trigger: Spec prioritized, not yet validated.

Agent: validation (model: sonnet, maxTurns: 25)

Validation checklist: - Spec completeness (6 checks): API types, DB schema with tenant_id+RLS, 10+ testable ACs, CloudEvents, dependencies, out of scope - Technical feasibility: architecture alignment, no circular deps, dependencies exist/planned, no stack conflicts - Effort estimation: T-shirt sizing (S/M/L/XL), broken down by API/DB/logic/tests/integration - Risk assessment: technical, integration, security, scope risks - Dependency map: blocked-by, blocks, shared libs, schema conflicts - Infrastructure readiness: Section 9 exists, auto-provisionable vs human resources identified, env vars match .env.example

Output: ~/dev/specs/$PROJECT/pdlc/validations/$SERVICE-validation.json

Verdict rules: - APPROVED: spec complete, feasible, no HIGH risks, dependencies satisfied - NEEDS_REVISION: spec incomplete or has addressable issues – sent back to spec-writer - BLOCKED: depends on unbuilt services, not feasible with current stack

On APPROVED: - Creates handoff file: ~/dev/specs/$PROJECT/pdlc/handoffs/$SERVICE-handoff.json - Posts to ADLC channel: spec ready for development

On NEEDS_REVISION: Posts details to PDLC channel. Spec-writer agent may be re-spawned for revision.

Slack: Posts to PDLC channel.


Step 5: ADLC Handoff

Trigger: Validation verdict = APPROVED.

Execution (by PDLC orchestrator): 1. Ensure spec.md exists 2. Write handoff file:

{
  "service": "$SERVICE",
  "project": "$PROJECT",
  "date": "ISO-timestamp",
  "state": "SUBMITTED_TO_ADLC",
  "specPath": "~/dev/specs/$PROJECT/specs/$SERVICE/spec.md"
}
  1. Post to ADLC Slack channel: :package: PDLC->ADLC handoff: $SERVICE spec ready for development
  2. Update pipeline-state.md: $SERVICE: SUBMITTED_TO_ADLC @ timestamp

After handoff – passive wait rules: - PDLC MUST NOT re-submit, re-validate, re-discover, or spawn any agent for this service - PDLC only reads ADLC state passively every 10 minutes

ADLC State Detected PDLC Action
No dev status yet (< 24h) Wait. Normal.
No dev status yet (> 24h) Post reminder to ADLC channel
Dev RUNNING or DONE Update to ADLC_ACTIVE
STAGING_DEPLOYED or PR merged Update to ADLC_STAGING, spawn GTM
E2E_PASS Post to DM: ready for prod promotion
PROD_DEPLOYED Spawn Analytics agent

Step 6: GTM (Go-to-Market)

Trigger: ADLC deploys service to staging (dispatcher-pdlc.sh detects STAGING_DEPLOYED in ADLC state and no GTM brief exists).

Agent: gtm (model: sonnet, maxTurns: 20)

Tools: Read, Bash, Glob, Grep, WebSearch

Output: ~/dev/specs/$PROJECT/pdlc/gtm/$SERVICE-gtm.md with: - Positioning (what, who, value prop, differentiation) - Feature summary - Rollout plan (internal testing, beta, GA, rollback) - Documentation needs (API docs, user guide, admin guide, migration guide, changelog) - Communication plan - Success metrics (adoption, engagement, quality, business) - Risks and mitigations

Slack: Posts to PDLC channel.


Step 7: Analytics

Trigger: Service deployed to production (future: PROD_DEPLOYED state).

Agent: analytics (model: sonnet, maxTurns: 20)

Data platform: ClickHouse (OLAP), Metabase (BI)

Output: ~/dev/specs/$PROJECT/pdlc/analytics/$SERVICE-kpis.md with: - KPI definitions (adoption rate, API latency p95, error rate, etc.) - Tracking plan (CloudEvents via Redpanda) - ClickHouse tables (materialized views) - Metabase dashboards (Operations, Product, Business) - Success criteria checklist - Review schedule

Slack: Posts KPI count, event count, dashboard count to PDLC channel.


4. Agent Registry

ADLC Agents

dev

Property Value
Model opus
Max Turns 50
Tools Read, Write, Edit, Bash, Glob, Grep
Input Spec (~/dev/specs/$PROJECT/specs/$SERVICE/spec.md), service CLAUDE.md, existing code
Output Git commits on feat/$TASK_ID branch, progress.md update, state.md update
Status format $PROJECT/$TASK_ID: DEV_COMPLETE @ ISO-timestamp (in state.md)
Pass criteria All tests pass, code committed and pushed
Fail criteria Tests fail 3 times on same issue
Forbidden Modify docker-compose.yml, change DB schemas outside Prisma, hardcode secrets

ba

Property Value
Model sonnet
Max Turns 35
Tools Read, Bash, Glob, Grep
Input Spec, architecture.md, business-rules.md, service code
Output ~/dev/ops/reviews/$SERVICE/ba-report.json
Status format $PROJECT/$SERVICE: BA_PASS\|BA_FAIL @ ISO-timestamp
Pass criteria criteriaMissing == 0 AND criteriaPartial == 0 AND no HIGH/CRITICAL deviations
Fail criteria criteriaMissing > 0 OR any HIGH/CRITICAL deviation
Forbidden Suggest code changes, write code, mark criteria MET without evidence

architect

Property Value
Model sonnet
Max Turns 30
Tools Read, Bash, Glob, Grep
Input architecture.md, business-rules.md, spec, service code
Output ~/dev/ops/reviews/$SERVICE/architect-report.json
Status format $PROJECT/$SERVICE: ARCHITECT_PASS\|ARCHITECT_FAIL @ ISO-timestamp -- X/8 checks
Pass criteria All 8 architectural checks pass
Fail criteria Single FAIL on any check
Forbidden Write production code, mark PASS without evidence

security

Property Value
Model sonnet
Max Turns 30
Tools Read, Bash, Glob, Grep
Input Service code
Output ~/dev/ops/reviews/$SERVICE/security-report.json
Status format $PROJECT/$SERVICE: SECURITY_PASS\|SECURITY_FAIL @ ISO-timestamp -- OWASP X/10, severity=Y
Pass criteria All OWASP checks PASS/N/A, no npm critical/high, no secrets
Fail criteria Any FAIL, any critical/high npm advisory, any secret in code
Forbidden Write code, skip automated scans, assume zero findings

devops

Property Value
Model sonnet
Max Turns 30
Tools Read, Write, Bash, Glob, Grep
Input Service code, Dockerfile, .env.example
Output (review mode) ~/dev/ops/reviews/$SERVICE/devops-report.json
Output (deploy mode) ~/dev/ops/reviews/$SERVICE/deploy-report.json, .env.staging
Status format (review) Contributes to $SERVICE-review.status
Status format (deploy) DEPLOYED\|DEPLOY_FAILED\|FIRST_DEPLOY_NEEDED in deploy-report.json
Pass criteria (review) Docker builds, health endpoint exists, tests pass
Fail criteria Docker build fails, no health endpoint, tests fail
Forbidden Write application code

pr

Property Value
Model sonnet
Max Turns 15
Tools Read, Bash, Glob, Grep
Input All JSON review reports in ~/dev/ops/reviews/$SERVICE/
Output GitHub PR, state.md update, progress.md update
Status format $PROJECT/$SERVICE: STAGING_DEPLOYED @ ISO-timestamp
Pass criteria All reviews pass, PR created/merged
Fail criteria Any review MISSING/FAIL, CI fails, merge conflicts
Forbidden Write application code, merge with security HIGH/CRITICAL, force merge conflicts

provisioner

Property Value
Model sonnet
Max Turns 30
Tools Read, Write, Bash, Glob, Grep
Input Spec (sections 9+10), service-project-map.json, .env.example
Output ~/dev/ops/coolify/$SERVICE.json, provisioner-report.json, .env.staging, DB schema, GCS bucket, GitHub repo
Status format DONE \| date \| provisioner \| $SERVICE \| verified-operational or PROVISION_INCOMPLETE
Pass criteria All resources created AND post-provision health check passes
Fail criteria Cannot create resource, health check fails
Forbidden Write application code, mark DONE before health check

scenario

Property Value
Model sonnet
Max Turns 25
Tools Read, Write, Edit, Bash, Glob, Grep, Playwright (navigate, snapshot, screenshot)
Input Spec, review files
Output scenarios.json, browser-scenarios.json, mock-data.sql, cleanup.sql in ~/dev/projects/$SERVICE/tests/e2e/
Status format $PROJECT/$SERVICE: SCENARIOS_READY @ ISO-timestamp -- X api, Y browser scenarios
Forbidden Write application code

e2e-test

Property Value
Model sonnet
Max Turns 30
Tools Read, Write, Bash, Glob, Grep, full Playwright MCP suite (24 tools)
Input Scenario files, staging URL, mock data
Output ~/dev/ops/reviews/$SERVICE/e2e-report.json, screenshots
Status format E2E_PASS\|E2E_FAIL\|E2E_BLOCKED
Forbidden Write application code

auditor

Property Value
Model sonnet
Max Turns 30
Tools Read, Bash, Glob, Grep
Input All status files, all review reports, registry, git history
Output ~/dev/ops/reviews/adlc-audit-YYYYMMDD-HHMM.json, last-audit.md
Status format N/A (audit is a cross-cutting concern)
Pass criteria Zero CRITICAL/HIGH findings = COMPLIANT
Triggers Every 6 hours (automated), on-demand via Slack, after major milestones
Audit scope Pipeline sequence compliance, review report quality (anti-rubber-stamp), registry consistency, orchestrator behavior (not coding directly), blocked/failed services, test coverage
Forbidden Write application code, modify review reports

resolver

Property Value
Model opus
Max Turns 40
Tools Read, Write, Edit, Bash, Glob, Grep
Input Blocked/failed status files, pipeline.log, external-blockers.log, agent definitions, skill definitions, specs
Output ~/dev/ops/reviews/resolver/RES-$DATE-$SEQ.json, changelog.md, resolver-fixes.log, resolver-monitor-*.json
Status format N/A (operates above both pipelines)
CAN modify Agent definitions, skill definitions, dispatcher scripts, spec-writer templates, CLAUDE.md rules, provisioner behavior, service-project-map.json
CANNOT modify Application source code, spec content for specific services, review reports, git history, systemd units

PDLC Agents

discovery

Property Value
Model sonnet
Max Turns 25
Tools Read, Bash, Glob, Grep, WebSearch, WebFetch
Input Market signals, user feedback, existing specs, business-rules.md
Output ~/dev/specs/$PROJECT/pdlc/opportunities/$NAME.md + .json
Forbidden Write specs, write code

spec-writer

Property Value
Model opus
Max Turns 40
Tools Read, Bash, Glob, Grep
Input Opportunity brief, architecture.md, business-rules.md, existing specs
Output ~/dev/specs/$PROJECT/specs/$SERVICE/spec.md, updates to service-project-map.json and backlog.md
Forbidden Write code

prioritization

Property Value
Model sonnet
Max Turns 20
Tools Read, Bash, Glob, Grep
Input Pending specs, opportunity briefs, backlog, roadmap, business-rules.md
Output Updated backlog.md, prioritization-$DATE.json
Forbidden Write specs, write code

validation

Property Value
Model sonnet
Max Turns 25
Tools Read, Bash, Glob, Grep
Input Spec, architecture.md, business-rules.md, existing services, ADLC state
Output ~/dev/specs/$PROJECT/pdlc/validations/$SERVICE-validation.json, handoff file on APPROVED
Forbidden Write code, write specs

gtm

Property Value
Model sonnet
Max Turns 20
Tools Read, Bash, Glob, Grep, WebSearch
Input Spec, ADLC state, PROJECT.md
Output ~/dev/specs/$PROJECT/pdlc/gtm/$SERVICE-gtm.md
Forbidden Write code, write specs

analytics

Property Value
Model sonnet
Max Turns 20
Tools Read, Bash, Glob, Grep
Input Spec, GTM brief, ADLC state
Output ~/dev/specs/$PROJECT/pdlc/analytics/$SERVICE-kpis.md
Forbidden Write code

5. Status File Format Standard

THE Definitive Format

All status files are written to ~/dev/ops/outputs/ and follow this format:

STATUS | ISO-date | agent-type | service | details

Example:

DONE | 2026-03-22T10:30:00+00:00 | ba | oid | from-json-report
FAILED | 2026-03-22T11:00:00+00:00 | security | docstore | severity=CRITICAL, findings=3
RUNNING | 2026-03-22T09:00:00+00:00 | dev | pdf-engine | task T-042
BLOCKED_EXTERNAL | 2026-03-22T08:00:00+00:00 | provisioner | billing-engine | need Stripe API key

Valid STATUS Values

Status Meaning
DONE Stage completed successfully
PASS Tests or checks passed
FAIL Tests or checks failed
FAILED Agent or stage failed
RUNNING Agent currently executing
BLOCKED Blocked on internal dependency
BLOCKED_EXTERNAL Blocked on external resource (human action needed)
PAUSED Manually paused
TRIAGING BA failure being analyzed by orchestrator
DEPLOYED Successfully deployed to staging
PROVISION_INCOMPLETE Provisioner created resources but health check failed

INVALID Values

Anything not in the above list is considered corrupted. The dispatcher’s read_status() function validates the first word:

case "$first_word" in
    DONE|PASS|FAIL|FAILED|RUNNING|BLOCKED|BLOCKED_EXTERNAL|PAUSED|TRIAGING|DEPLOYED|PROVISION_INCOMPLETE|"")
        echo "$first_word"
        ;;
    *)
        # Corrupted — log and auto-delete
        rm "$actual_file"
        echo ""
        ;;
esac

Invalid examples: JSON blobs, service names, partial data, markdown, anything an agent writes that is not a standard keyword.

Status File Naming Convention

~/dev/ops/outputs/$SERVICE-$STAGE.status

Where $STAGE is one of: dev, test, ba, review, pr, deploy, provisioner, discovery, spec-writer, prioritization, validation, gtm, analytics

Additional files: - $SERVICE.crashes – crash counter (plain integer) - $SERVICE-dev-$TASKID.status – per-task dev status (glob fallback)

Stale RUNNING Detection

The dispatcher resets any RUNNING status file older than 30 minutes (agent likely crashed or was killed by session restart):

if [ "$stage_status" = "RUNNING" ]; then
    stage_age=$(( $(date +%s) - $(stat -c '%Y' "$stage_file") ))
    if [ "$stage_age" -gt 1800 ]; then
        rm "$stage_file"
    fi
fi

6. Dispatcher State Machine

ADLC Dispatcher (dispatcher-v3.sh)

Execution Order (every 5 minutes)

  1. Branch consolidation (consolidate_branches)
  2. Pipeline scan (process_pipeline)
  3. Slack inbox (process_slack_inbox)
  4. External blocker detection (detect_external_blockers)
  5. Resolver trigger (maybe_spawn_resolver)
  6. Kanban summary (maybe_post_kanban, every 6th run)

State Transitions

For each service in ~/dev/projects/*/ (must have .git):

              +-----------+
              | No status |
              +-----+-----+
                    |
                    v
              +-----------+
              | dev: DONE |
              +-----+-----+
                    |
    [bash: test-runner.sh, zero tokens]
                    |
          +---------v---------+
          |  test: PASS/FAIL  |
          +---------+---------+
                    |
              (if PASS)
                    |
    [inject: "Spawn /agent ba"]
    [write: ba.status = RUNNING]
                    |
          +---------v---------+
          |   ba: DONE/FAILED |
          +---------+---------+
                    |
         DONE               FAILED
           |                  |
           |     [inject: "BA FAIL, analyze"]
           |     [write: ba.status = TRIAGING]
           |                  |
    [inject: "Spawn reviews"] |
    [write: review.status = RUNNING]
           |
    +---------v---------+
    | review: DONE/FAIL |
    +---------+---------+
              |
   (if DONE AND all_reviews_pass)
              |
    [inject: "Spawn /agent pr"]
    [write: pr.status = RUNNING]
              |
    +---------v---------+
    |   pr: DONE/FAIL   |
    +---------+---------+
              |
      (if DONE, based on service type)
              |
    +----+----+--------+
    |    |             |
  script infra     web-service
    |    |             |
  DONE  +------+-------+
               |
     [has coolify config?]
        |            |
       YES          NO
        |            |
    [deploy]    [provisioner]
        |            |
    +---v---+   +----v----+
    |DEPLOYED|  |PROVISION|
    +--------+  +---------+

What the Dispatcher Reads

Source What It Extracts
~/dev/ops/outputs/$SERVICE-$STAGE.status First word = status keyword
~/dev/ops/reviews/$SERVICE/*.json JSON verdicts (fallback if no .status file)
~/dev/ops/outputs/$SERVICE.crashes Crash count (circuit breaker)
~/dev/ops/agents/service-project-map.json Service -> project mapping, service type, deploy mode
~/dev/ops/coolify/$SERVICE.json Coolify config existence check
~/dev/ops/slack-inbox/*.txt Slack messages to inject

Decision Logic (Pure Bash)

The dispatcher NEVER uses Claude for decisions. It uses: - read_status() – extract and validate first word from status file - all_reviews_pass() – parse all 4 JSON review files with python3 one-liners - get_project(), get_service_type(), get_deploy_mode() – read from service-project-map.json - get_crashes(), inc_crashes() – crash counter management - review_verdict() – extract a JSON field from a review report

What the Dispatcher Injects into Claude

The dispatcher sends text commands to Claude via tmux send-keys. Examples:

Spawn /agent ba for oid. PROJECT: ods-platform. Spec: ~/dev/specs/ods-platform/specs/oid/spec.md
BA FAIL for docstore. Read ~/dev/ops/reviews/docstore/ba-report.json. If missing criteria map to pending tasks, spawn dev agents. If real failure, spawn dev fix.
Spawn /agent architect + /agent security + /agent devops for pdf-engine. PROJECT: ods-platform.
Spawn /agent provisioner for notification-hub. PROJECT: ods-platform. TYPE: web-service. Create Coolify application.
Spawn /agent resolver. SYSTEMIC BLOCKER DETECTED: 3 services blocked. Top causes: missing Coolify config.

What Status Files the Dispatcher Creates/Modifies

Action File Written
Tests pass $SERVICE-test.status = PASS
Tests fail $SERVICE-test.status = FAIL
Spawn BA $SERVICE-ba.status = RUNNING
BA FAIL triage $SERVICE-ba.status = TRIAGING
Spawn reviews $SERVICE-review.status = RUNNING
Spawn PR $SERVICE-pr.status = RUNNING
Spawn deploy $SERVICE-deploy.status = RUNNING
Spawn provisioner $SERVICE-provisioner.status = RUNNING
Script service (no deploy) $SERVICE-deploy.status = DONE \| date \| deploy \| $SERVICE \| script-type
Recover BA from JSON $SERVICE-ba.status = DONE \| date \| ba \| $SERVICE \| from-json-report
Recover reviews from JSON $SERVICE-review.status = DONE \| date \| reviews \| $SERVICE \| from-json-reports
Stale RUNNING (>30min) Deletes the status file
Corrupted status Deletes the status file

PDLC Dispatcher (dispatcher-pdlc.sh)

Execution Order (every 10 minutes)

  1. Process PDLC Slack inbox
  2. Check ADLC state for GTM/Analytics triggers
  3. Detect idle PDLC Claude and inject check-pdlc
  4. Kanban summary (every 6th run = 60 min)

GTM Auto-Trigger Logic

# For each service with a spec under each project:
if ADLC state shows STAGING_DEPLOYED for $SERVICE:
    if no GTM brief exists at ~/dev/specs/$PROJECT/pdlc/gtm/$SERVICE-gtm.md:
        if $SERVICE-gtm.status != RUNNING and != DONE:
            inject "Spawn /agent gtm for $SERVICE"

Idle Detection

# If last visible line contains "bypass permissions" (Claude prompt)
# AND no agents running (no "local agents" or "background tasks" in pane):
inject "Run check-pdlc. Read ~/.claude/skills/check-pdlc/SKILL.md."

7. Service Classification

Three Service Types

Type Description Deployment Coolify Entity Health Check
web-service REST API or frontend, runs permanently Dockerfile build Coolify Application /health endpoint
infrastructure Broker, database, cache (Redpanda, ClickHouse, etc.) docker-compose Coolify Service Container status
script Migration, data import, CLI tool None (one-shot) None Build check only

service-project-map.json Structure

Location: ~/dev/ops/agents/service-project-map.json

{
  "oid": {
    "project": "ods-platform",
    "type": "web-service",
    "stack": "node",
    "deploy": "dockerfile"
  },
  "redpanda": {
    "project": "ods-platform",
    "type": "infrastructure",
    "stack": "docker",
    "deploy": "docker-compose"
  },
  "migration": {
    "project": "lejecos",
    "type": "script",
    "stack": "node",
    "deploy": "one-shot"
  }
}

Legacy format (backward compatible): "oid": "ods-platform" (simple string = project name, defaults to type=web-service, deploy=dockerfile).

Deploy Mode Mapping

Deploy Mode Coolify Action Dispatcher Behavior
dockerfile Create Coolify Application, Dockerfile build pack Spawn provisioner if no config, then devops deploy
docker-compose Create Coolify Service Spawn provisioner for compose, then devops deploy
one-shot No Coolify deployment Mark deploy as DONE immediately

Current Projects

Project Specs Directory Services
ods-platform ~/dev/specs/ods-platform/ oid, redpanda, docstore, pdf-engine, notification-hub, workflow-engine, form-engine, billing-engine, securemail, doceditor, agenda
ods-dashboard ~/dev/specs/ods-dashboard/ ods-dashboard
lejecos ~/dev/specs/lejecos/ migration

8. External Dependencies

Resource Categories and Provisionability

Resource Type CLI/Method Auto-Provisionable Agent
PostgreSQL schema + user psql Yes provisioner
GCS bucket + service account gcloud storage Yes provisioner
Coolify app/service Coolify API Yes (if token exists) provisioner
GitHub repo gh Yes provisioner
Redpanda topics rpk / docker exec Yes provisioner
Redis instance docker / GCP Memorystore Yes provisioner
TLS certificates Coolify (Let’s Encrypt) Yes (automatic) Coolify
COOLIFY_API_TOKEN Manual No (first time) Human
External API credentials (Stripe, SendGrid, CinetPay) Manual No Human
DNS wildcard configuration Manual No (first time) Human
SMTP server credentials Manual No Human

Known External Dependencies Per Service

Tracked in ~/dev/ops/external-deps.md:

Service Dependency Type Auto
oid PostgreSQL 5433 infra Yes
docstore S3/MinIO infra Partial (need bucket + credentials)
notification-hub SMTP server infra No
notification-hub SendGrid API external-api No
pdf-engine (self-contained) - N/A
billing-engine Stripe API external-api No
billing-engine CinetPay API external-api No
all Coolify deployment Partial (need per-service UUID)

Escalation Format for Human Blockers

When a resource requires human action, the provisioner (or orchestrator) posts to Slack DM:

:key: EXTERNAL BLOCKER -- {service}/{task}
Category: {credentials|infrastructure|deployment|external-api|network|permissions}
Missing: {specific resource or credential}
Spec reference: {where in spec.md this is mentioned}
Impact: {what cannot proceed without this}
Action needed: {exact steps for the human}

After posting: 1. Mark task as BLOCKED_EXTERNAL in pipeline state 2. Log to ~/dev/ops/outputs/external-blockers.log with timestamp 3. Move to next task (do NOT wait for resolution) 4. When human responds in Slack, the slack-bridge skill resets the blocked status


9. Resolver Protocol

When It Triggers

The dispatcher spawns the resolver agent when: 1. 2+ services blocked on the same root causemaybe_spawn_resolver() counts blocked/failed deploy/provisioner status files, groups by reason. If total >= 2 and unique reasons <= 2, it is systemic. 2. 2+ corrupted status files – non-standard first word in status files indicates agents are writing invalid format. 3. Periodic scan (30 min) finds recurring patterns. 4. Human request via Slack: “resolve”, “root cause”, “why is everything blocked”

Cooldown: resolver will not re-run within 1 hour of last run (checks resolver-last-run.ts).

6-Phase Cycle

Phase 1: Root Cause Analysis

Trace from symptom to root cause using a structured tree:

BLOCKER: [description]
+-- IMMEDIATE: [what directly failed]
+-- AGENT GAP: [which agent should have caught/handled this]
+-- SPEC GAP: [what the spec should have specified]
+-- TEMPLATE GAP: [what the spec-writer template is missing]
+-- PIPELINE GAP: [what the dispatcher/provisioner doesn't check]
+-- DESIGN GAP: [systemic assumption that was wrong]

Sources read: blocked/failed status files, pipeline.log, external-blockers.log, agent definitions, skill definitions, specs.

Phase 2: Plan

Detailed fix plan written BEFORE any changes. For each proposed change: - Component and file path - What to modify - Why this fixes the root cause

Phase 3: Impact Analysis (MANDATORY)

For each proposed change, evaluate across 4 dimensions:

A. ADLC Integrity Check: Does it alter pipeline sequence? Weaken review gates? Bypass safety mechanisms?

B. PDLC Integrity Check: Does it alter product lifecycle? Affect spec quality? Bypass validation?

C. Bias Detection: Does it favor one service type? Reduce observability? Make rollback harder?

D. Impact Score (6 dimensions, each 0-3):

Dimension Scale
ADLC pipeline integrity 0=no impact, 3=breaks pipeline
PDLC pipeline integrity 0=no impact, 3=breaks pipeline
Review quality 0=no impact, 3=weakens reviews
System generality 0=no impact, 3=becomes specific
Observability 0=no impact, 3=reduces visibility
Rollback safety 0=easy rollback, 3=irreversible
TOTAL sum / 18

Decision thresholds:

Total Score Decision
0-2 AUTO-APPLY – minimal impact, proceed
3-5 APPLY WITH MONITORING – low risk, watch closely
6-9 HUMAN REVIEW REQUIRED – post plan to Slack DM, wait for approval
10+ DO NOT APPLY – redesign the fix

Phase 4: Apply (if impact acceptable)

What resolver CAN modify: - Agent definitions (~/.claude/agents/*.md) - Skill definitions (~/.claude/skills/*/SKILL.md) - Dispatcher scripts (~/dev/ops/adlc-v2/scripts/*.sh, ~/dev/ops/pdlc/scripts/*.sh) - Spec-writer templates - CLAUDE.md orchestrator rules - Provisioner behavior - service-project-map.json

What resolver CANNOT modify: - Application source code in ~/dev/projects/*/src/ - Spec content for specific services - Review reports (read-only audit trails) - Git history - systemd units (escalate to human)

Phase 5: Post-Application Monitoring

After applying fixes: 1. Record fix in ~/dev/ops/outputs/resolver-fixes.log 2. Write monitoring criteria to ~/dev/ops/outputs/resolver-monitor-$BLOCKER_ID.json 3. On next resolver scan (30 min): read monitor files, execute check commands, compare against success criteria 4. If success: mark resolved, clean up 5. If failed: execute rollback plan, escalate to Slack DM

The dispatcher also checks monitoring files: if monitorUntil timestamp has passed, it spawns the resolver for verification.

Phase 6: Documentation

Write resolution report to ~/dev/ops/reviews/resolver/RES-$DATE-$SEQ.json with: - Blocker description, services affected, duration blocked - Root cause chain (immediate, agent gap, spec gap, template gap, pipeline gap, design gap) - Fix details (plan, files modified, impact score) - Monitoring results (expected vs actual outcome) - Lessons learned, prevention measures

Append summary to ~/dev/ops/reviews/resolver/changelog.md.


10. Slack Channels and Communication

Channel Map

Channel ID Purpose Pipeline
ADLC C0AN0N8AUGZ Pipeline commands, progress milestones, kanban ADLC
PDLC C0AN42N3C0L PM commands, product updates, PDLC kanban PDLC
DM D0AGRAVEC1K Blockers, human review, human interaction Both

Token Loading

source ~/.env.adlc 2>/dev/null || source ~/.env.openclaw 2>/dev/null

Message Formats by Type

Blocker Notification (DM)

:rotating_light: BLOCKED -- {service}/{task}
Reason: {reason}
Action needed: {what the human should do}

External Blocker (DM)

:key: EXTERNAL BLOCKER -- {service}/{task}
Category: {credentials|infrastructure|deployment|external-api|network|permissions}
Missing: {specific resource or credential}
Spec reference: {where in spec.md this is mentioned}
Impact: {what cannot proceed without this}
Action needed: {exact steps for the human}

Human Review Required (DM)

:eyes: HUMAN REVIEW -- {service}/{task}
Context: {summary}
Options: {what the human can reply}

Progress Milestone (ADLC Channel)

:white_check_mark: {service} -- {milestone}
{brief details}

Agent Results (ADLC Channel)

[BA] $SERVICE: $STATUS -- $CRITERIA_MET/$CRITERIA_TOTAL criteria met. $DEVIATIONS deviations.
[ARCHITECT] $SERVICE: $VERDICT -- $CHECKS_PASSED/8 checks passed.
[SECURITY] $SERVICE: $STATUS -- OWASP $SCORE/10, severity=$SEVERITY, $FINDINGS findings.
[E2E] $SERVICE: $VERDICT -- API: $PASSED/$TOTAL, Browser: $PASSED/$TOTAL.
[PROVISIONER] $SERVICE: $VERDICT -- Coolify=$X, DB=$Y, Bucket=$Z.
[AUDIT] $VERDICT -- $N services. $FINDINGS findings ($CRITICAL critical, $HIGH high).

PDLC Notifications (PDLC Channel)

[DISCOVERY] $OPPORTUNITY -- $RECOMMENDATION ($PRIORITY). Impact: $IMPACT.
[SPEC] $SERVICE spec written -- $AC_COUNT acceptance criteria, $ENDPOINT_COUNT endpoints.
[PRIORITY] $PROJECT backlog re-prioritized -- $TOTAL items scored. Top 3: $TOP3.
[VALIDATION] $SERVICE -- $VERDICT. Effort: $EFFORT. Risks: $RISK_COUNT.
[GTM] $SERVICE brief ready -- rollout plan, docs checklist, success metrics.
[ANALYTICS] $SERVICE -- $KPI_COUNT KPIs, $EVENT_COUNT events, $DASHBOARD_COUNT dashboards.

PDLC to ADLC Handoff (ADLC Channel)

:package: PDLC->ADLC handoff: $SERVICE spec ready for development.

Resolver Notifications

:mag: RESOLVER analyzing systemic blocker: $DESCRIPTION ($N services affected)
:wrench: RESOLVER applying fix: $SUMMARY (impact $SCORE/18 -- auto-approved)
:eyes: RESOLVER needs approval: $SUMMARY (impact $SCORE/18) [DM]
:white_check_mark: RESOLVER fix verified: $BLOCKER_ID -- $N services unblocked.
:rotating_light: RESOLVER fix FAILED: $BLOCKER_ID -- rolling back. $REASON

Blocker Resolution Flow

  1. System detects blocker, posts to DM with :key: or :rotating_light: format
  2. Human fixes the issue externally (adds credentials, creates resource, etc.)
  3. Human responds in Slack DM with one of: resolved, fixed, done, c'est fait, ok, credentials added, token ajoute, cle ajoutee, {service} unblocked
  4. Slack bridge (ADLC or PDLC) detects message, routes to orchestrator
  5. Orchestrator’s slack-bridge skill:
  6. Dispatcher’s next 5-min cycle detects missing status files, resumes pipeline for the service

Slack Bridge Intent Routing (ADLC)

Pattern Action
status, etat, avancement Run /status, post summary
kanban, board Run /kanban, post board
pause, stop, arreter Pause all agents
resume, reprendre, go Resume pipeline
deploy {service} Spawn DevOps deploy mode
merge {service} Create PR and merge
rollback {service} Roll back staging
onboard {project} Queue generate-specs
launch {service} Queue dev-task
audit Spawn auditor
fix {service} {desc} Spawn dev agent with fix
review {service} Spawn BA + Architect + Security + DevOps
test {service} Run test-runner.sh
e2e {service} Spawn scenario + e2e-test
logs {service} Read/summarize review reports

Slack Bridge Intent Routing (PDLC)

Pattern Action
discover {topic} Spawn discovery agent
spec {service} Spawn spec-writer
prioritize, backlog Spawn prioritization
validate {service} Spawn validation
gtm {service} Spawn GTM
analytics {service} Spawn analytics
handoff {service} Trigger ADLC handoff
status, pipeline Post pipeline status
adlc status Read and summarize ADLC state

11. Known Issues and Mitigations

Context Window Exhaustion

Problem: Claude sessions accumulate context over hours, eventually degrading quality or crashing.

Mitigations: - Daily restart at 4am UTC – kills and restarts both tmux sessions, runs /boot to reconstruct state from disk files - session-health.sh (1 min) – monitors session age; if > 12h AND > 200 interactions, restarts when no agents running - Interaction counter (~/dev/ops/outputs/claude-interactions.count) – tracks cumulative injections - Idle detection – if pane unchanged for 15 min with no agents, session is stuck; kill and restart

Status File Corruption

Problem: Agents sometimes write non-standard status values (JSON blobs, service names, markdown).

Mitigations: - read_status() validation – dispatcher extracts first word from status file and checks against known list; corrupted files are auto-deleted and logged - Resolver auto-trigger – if 2+ corrupted files detected in a single scan, resolver agent is spawned to trace which agent is writing invalid format and fix the agent prompt - Stale RUNNING detection – any RUNNING status file older than 30 minutes is auto-deleted (agent likely crashed)

Agent Writes Wrong Format

Problem: An agent writes a status file that doesn’t match the expected STATUS | date | agent | service | details format.

Mitigations: - Dispatcher validationread_status() only accepts known keywords; anything else is treated as corrupted - JSON report fallback – if no .status file but a JSON review report exists, dispatcher recovers status from JSON (ba-report.json status=compliant -> ba.status = DONE) - Resolver – traces which agent prompt is producing invalid format, fixes the agent definition

Claude Acts Instead of Delegating

Problem: The orchestrator Claude writes code, runs tests, or does work it should delegate to subagents.

Mitigations: - CLAUDE.md absolute rules – 8 rules that MUST NEVER be broken: 1. NEVER edit source code 2. NEVER run tests (test-runner.sh does that) 3. NEVER create files in service directories 4. NEVER run cargo/npm/pnpm/go/dotnet/node 5. NEVER commit or push 6. NEVER loop or schedule itself 7. NEVER run build commands 8. NEVER run PDLC skills (those are for the other orchestrator) - Auditor agent – checks git history for commits by the supervisor, checks pipeline.log for direct file edits

PDLC Skills Executed by ADLC

Problem: ADLC orchestrator accidentally runs /check-pdlc or /boot-pdlc, which belong to the PDLC orchestrator.

Mitigation: CLAUDE.md Rule 8 explicitly states: > NEVER run /check-pdlc or /boot-pdlc – those are PDLC skills for the other orchestrator (ods-pdlc tmux). You are ADLC only. Your skills: /check-pipeline, /boot, /status, /kanban, /dev-task, /slack-bridge, /registry.

Circuit Breaker Exhaustion

Problem: A service fails 3+ times and gets permanently blocked.

Mitigations: - Crash counter ($SERVICE.crashes) tracked per service - After 3 crashes: service is skipped by dispatcher, BLOCKED posted to Slack DM - Human can reset: reply {service} unblocked in Slack DM to clear status files - Resolver can analyze systemic causes across multiple blocked services

Memory Pressure

Problem: Too many concurrent Claude agents exhaust server RAM.

Mitigations: - Before spawning agents: orchestrator checks awk '/MemAvailable/ {print int($2/1024)}' /proc/meminfo - If > 2000MB: spawn freely, all pending work in parallel - If < 2000MB: queue new spawns, wait for running agents to finish, post to Slack DM - If < 512MB (critical): session-health.sh kills largest non-supervisor Claude process - Subagent design: review agents use sonnet (lighter) rather than opus; only dev, spec-writer, and resolver use opus

Duplicate Agent Spawning (PDLC)

Problem: PDLC dispatcher could spawn duplicate agents for the same service+stage.

Mitigation: check-pdlc skill implements anti-duplicate gates:

check_agent_status() {
  local status_file=~/dev/ops/outputs/${service}-${agent_type}.status
  # RUNNING -> do NOT spawn
  # DONE -> do NOT re-run
  # FAILED (<3 attempts) -> retry once
  # FAILED (>=3) -> BLOCKED
  # NONE -> eligible to spawn
}

Merge Conflicts During Branch Consolidation

Problem: Feature branches conflict when merged into dev.

Mitigation: Dispatcher tries git merge --no-edit. On failure: - git merge --abort - Log conflict - Post to Slack DM: :warning: Merge conflict: $SERVICE/$branch needs manual resolution - Skip that branch, continue with others


12. File System Map

Root Directories

Path Purpose
~/dev/ Working root for all development
~/dev/projects/ Git repositories for each service
~/dev/specs/ Project specs, backlogs, roadmaps, PDLC artifacts
~/dev/ops/ Operations: scripts, agents, reviews, outputs
~/.claude/ Claude Code configuration and agent memory

Operations (~/dev/ops/)

Path Purpose
~/dev/ops/outputs/ Status files, pipeline log, crash counters, dispatcher state
~/dev/ops/outputs/pipeline.log ADLC dispatcher log (all transitions, injections, errors)
~/dev/ops/outputs/pdlc-pipeline.log PDLC dispatcher log
~/dev/ops/outputs/session-health.log Session health monitor log
~/dev/ops/outputs/external-blockers.log External blocker history
~/dev/ops/outputs/resolver-fixes.log Resolver fix history
~/dev/ops/outputs/$SERVICE-$STAGE.status Per-service per-stage status files
~/dev/ops/outputs/$SERVICE.crashes Per-service crash counter
~/dev/ops/outputs/dispatcher-run-count ADLC dispatcher run counter (for kanban every 6th)
~/dev/ops/outputs/pdlc-dispatcher-run-count PDLC dispatcher run counter
~/dev/ops/outputs/session-health-state Idle counter and hash state
~/dev/ops/outputs/claude-interactions.count Cumulative interaction counter
~/dev/ops/outputs/resolver-last-run.ts Timestamp of last resolver execution
~/dev/ops/outputs/resolver-monitor-*.json Post-fix monitoring criteria
~/dev/ops/reviews/ Review reports directory
~/dev/ops/reviews/$SERVICE/ Per-service review reports
~/dev/ops/reviews/$SERVICE/ba-report.json BA review report
~/dev/ops/reviews/$SERVICE/architect-report.json Architect review report
~/dev/ops/reviews/$SERVICE/security-report.json Security review report
~/dev/ops/reviews/$SERVICE/devops-report.json DevOps review report
~/dev/ops/reviews/$SERVICE/deploy-report.json Deploy report
~/dev/ops/reviews/$SERVICE/e2e-report.json E2E test report
~/dev/ops/reviews/$SERVICE/provisioner-report.json Provisioner report
~/dev/ops/reviews/$SERVICE/screenshots/ E2E browser test screenshots
~/dev/ops/reviews/$SERVICE/last-reviewed-commit.txt Last commit reviewed by BA
~/dev/ops/reviews/$SERVICE/last-fail-commit.txt Commit at time of last review failure
~/dev/ops/reviews/adlc-audit-YYYYMMDD-HHMM.json Periodic audit reports
~/dev/ops/reviews/resolver/ Resolver resolution reports and changelog
~/dev/ops/reviews/resolver/RES-$DATE-$SEQ.json Individual resolution reports
~/dev/ops/reviews/resolver/changelog.md Resolver fix history
~/dev/ops/agents/service-project-map.json Service -> project mapping (THE registry)
~/dev/ops/coolify/$SERVICE.json Coolify deployment config per service
~/dev/ops/external-deps.md Known external dependencies table
~/dev/ops/slack-inbox/ ADLC Slack message inbox
~/dev/ops/slack-inbox/processed/ Processed ADLC Slack messages
~/dev/ops/pdlc-slack-inbox/ PDLC Slack message inbox
~/dev/ops/pdlc-slack-inbox/processed/ Processed PDLC Slack messages

Scripts

Path Purpose Interval
~/dev/ops/adlc-v2/scripts/dispatcher-v3.sh ADLC bash dispatcher 5 min (systemd)
~/dev/ops/adlc-v2/scripts/session-health.sh Claude session health monitor 1 min (systemd)
~/dev/ops/adlc-v2/scripts/test-runner.sh Bash test runner (zero tokens) On demand
~/dev/ops/adlc-v2/scripts/slack-bridge.sh ADLC Slack polling bridge Continuous
~/dev/ops/pdlc/scripts/dispatcher-pdlc.sh PDLC bash dispatcher 10 min (systemd)
~/dev/ops/pdlc/scripts/pdlc-slack-bridge.sh PDLC Slack polling bridge Continuous

Agent Definitions

Path Agent
~/.claude/agents/dev.md Developer agent
~/.claude/agents/ba.md Business Analyst agent
~/.claude/agents/architect.md Architect review agent
~/.claude/agents/security.md Security review agent
~/.claude/agents/devops.md DevOps review/deploy agent
~/.claude/agents/pr.md PR creation agent
~/.claude/agents/provisioner.md Infrastructure provisioner agent
~/.claude/agents/scenario.md E2E scenario generator agent
~/.claude/agents/e2e-test.md E2E test executor agent
~/.claude/agents/auditor.md ADLC compliance auditor agent
~/.claude/agents/resolver.md Systemic problem resolver agent
~/.claude/agents/discovery.md Product discovery agent
~/.claude/agents/spec-writer.md Spec writer agent
~/.claude/agents/prioritization.md Backlog prioritization agent
~/.claude/agents/validation.md Spec validation agent
~/.claude/agents/gtm.md Go-to-market agent
~/.claude/agents/analytics.md Analytics/KPI agent

Skill Definitions

Path Skill Pipeline
~/.claude/skills/dev-task/SKILL.md /dev-task – develop a feature ADLC
~/.claude/skills/boot/SKILL.md /boot – daily context rebuild ADLC
~/.claude/skills/status/SKILL.md /status – show current state ADLC
~/.claude/skills/kanban/SKILL.md /kanban – visual kanban board ADLC
~/.claude/skills/registry/SKILL.md /registry – maintain service-project-map.json ADLC
~/.claude/skills/check-pipeline/SKILL.md /check-pipeline – scan and advance pipeline ADLC
~/.claude/skills/slack-bridge/SKILL.md /slack-bridge – interpret Slack messages ADLC
~/.claude/skills/boot-pdlc/SKILL.md /boot-pdlc – PDLC context rebuild PDLC
~/.claude/skills/pdlc-bridge/SKILL.md /pdlc-bridge – interpret PDLC Slack messages PDLC
~/.claude/skills/check-pdlc/SKILL.md /check-pdlc – scan and advance PDLC pipeline PDLC

Project Specs (~/dev/specs/$PROJECT/)

Path Purpose
~/dev/specs/$PROJECT/PROJECT.md Project definition
~/dev/specs/$PROJECT/context/architecture.md Architecture decisions
~/dev/specs/$PROJECT/context/business-rules.md Business rules
~/dev/specs/$PROJECT/gestion/progress.md Dev progress tracking
~/dev/specs/$PROJECT/gestion/backlog.md Prioritized backlog (RICE scored)
~/dev/specs/$PROJECT/gestion/roadmap.md Timeline / roadmap
~/dev/specs/$PROJECT/specs/$SERVICE/spec.md Service specification
~/dev/specs/$PROJECT/pdlc/pipeline-state.md PDLC pipeline state per feature
~/dev/specs/$PROJECT/pdlc/opportunities/$NAME.md Discovery opportunity briefs
~/dev/specs/$PROJECT/pdlc/opportunities/$NAME.json Discovery opportunity JSON summaries
~/dev/specs/$PROJECT/pdlc/validations/$SERVICE-validation.json Spec validation reports
~/dev/specs/$PROJECT/pdlc/handoffs/$SERVICE-handoff.json ADLC handoff files
~/dev/specs/$PROJECT/pdlc/gtm/$SERVICE-gtm.md GTM briefs
~/dev/specs/$PROJECT/pdlc/analytics/$SERVICE-kpis.md Analytics/KPI definitions
~/dev/specs/$PROJECT/pdlc/prioritization-$DATE.json Prioritization snapshots

Service Projects (~/dev/projects/$SERVICE/)

Path Purpose
~/dev/projects/$SERVICE/CLAUDE.md Service-specific Claude rules
~/dev/projects/$SERVICE/.env Environment variables (runtime)
~/dev/projects/$SERVICE/.env.example Env var documentation
~/dev/projects/$SERVICE/.env.staging Staging URL and config
~/dev/projects/$SERVICE/tests/e2e/scenarios.json API E2E test scenarios
~/dev/projects/$SERVICE/tests/e2e/browser-scenarios.json Browser E2E test scenarios
~/dev/projects/$SERVICE/tests/e2e/mock-data.sql E2E test data
~/dev/projects/$SERVICE/tests/e2e/cleanup.sql E2E cleanup script

Claude Memory (~/.claude/)

Path Purpose
~/.claude/agent-memory/pipeline/state.md ADLC pipeline state (read by both pipelines)
~/.claude/agent-memory/pipeline/last-audit.md Last audit summary
~/.claude/agent-memory/pipeline/last-audit-ts.txt Last audit timestamp
~/.claude/agent-memory/pipeline/run-count Check-pipeline run counter
~/.claude/agent-memory/pipeline/retries-$SERVICE.txt Per-service retry counter
~/.claude/agent-memory/pdlc-bridge/ PDLC bridge timestamps
~/dev/CLAUDE.md ADLC orchestrator rules

Infrastructure

Path Purpose
~/dev/ops/coolify/$SERVICE.json Coolify app config (UUID, URLs, registry)
~/dev/infra-registry/ Infrastructure asset registry (git repo)
~/.env.adlc ADLC environment (SLACK_BOT_TOKEN, COOLIFY_API_TOKEN, etc.)
~/.env.openclaw Fallback environment file
/tmp/dispatcher-v3.lock ADLC dispatcher lock file
/tmp/dispatcher-pdlc.lock PDLC dispatcher lock file

Database

Resource Connection
PostgreSQL postgres://ods:ods-dev-2026@127.0.0.1:5433/ods

GCP

Resource Value
Project ID ninth-park-452914-v8
Region europe-west1
Bucket naming ods-$SERVICE-staging

End of System Reference Document

13. Innovation Pipeline

Architecture

Third pipeline alongside ADLC and PDLC. Feeds both with technology signals.

Agents (5)

Agent Model Turns Role
veille sonnet 30 Daily tech watch: releases, CVEs, trending repos (top 10 rotation), ad-hoc URL review, security audit
benchmark opus 35 Monthly comparison ODS vs industry best practices
poc-builder opus 40 Rapid PoC prototypes in ~/dev/pocs/
innovation-scorer sonnet 20 FIRE framework scoring + correlation with ODS backlog/specs/daily findings
adr-writer sonnet 15 Architecture Decision Records

Slack Channel

Commands (via Slack #Innovation)

Pattern Action
Any URL Ad-hoc veille review with security audit
veille, watch Run daily veille
benchmark Run monthly benchmark
poc {description} Build proof of concept
score {proposal} FIRE score + correlation
adr {decision} Write Architecture Decision Record

Timers

Timer Schedule Action
ods-innovation.timer Daily 07:00 UTC Run veille agent
ods-innovation-bridge.service Always-on Poll #Innovation every 30s

Output Format

FIRE Scoring Framework

Integration

14. CLI Tools (Deterministic Output)

Available Tools

CLI Purpose Validates
write-status.sh Status files Status ∈ enum (11 values), agent ∈ enum
write-review.sh JSON review reports Schema per agent type, verdict ∈ enum
write-lesson.sh Lessons learned All 6 fields non-empty (min 5 chars)
write-pipeline-state.sh Pipeline state State ∈ enum (27 values)
write-human-review.sh Human decision reports JSON schema, generates HTML + PDF + GDrive + Slack

Principle

Agents MUST use CLI tools to write output files. Never Write/Edit directly. The CLI validates format and rejects invalid input — the agent has no choice but to produce correct output.

15. Bridges (Slack → Orchestrator)

Architecture

All 3 bridges poll every 30s, inject directly into tmux via send-keys. Fallback to inbox files if tmux session is down.

Bridge Channel Tmux Target Method
ADLC C0AN0N8AUGZ + D0AGRAVEC1K ods-claude Direct tmux + inbox fallback
PDLC C0AN42N3C0L ods-pdlc Direct tmux
Innovation C0AMSKF5NCF ods-claude Direct tmux

Known Issues

16. PDLC Adaptive Modes

The PDLC dispatcher auto-detects its mode based on pending work:

Mode Trigger Scan Interval Actions
Active Specs being written/validated/designed 10 min Full pipeline scan + ADLC monitoring + poke Claude
Monitoring All specs done, ADLC working 30 min Check ADLC state → GTM/Analytics/UI validation triggers only
Idle Nothing pending, all deployed Bridge only Process Slack inbox, no scan

Mode Detection Logic

pending = count(spec-writing + validating + UI_DESIGN + DISCOVERY in pipeline-state.md)
running = count(RUNNING in *-spec-writer.status, *-validation.status, *-ui-design.status)
monitoring_work = count(STAGING_DEPLOYED without GTM, PROD_DEPLOYED without Analytics)

if pending + running > 0 → active
elif monitoring_work > 0 → monitoring
else → idle

Mode Transitions

Monitoring Triggers

ADLC State PDLC Action
STAGING_DEPLOYED + no GTM brief Spawn GTM agent
STAGING_DEPLOYED + has design brief + no UI review Spawn UI Designer Mode 2 (visual validation)
PROD_DEPLOYED + no Analytics KPIs Spawn Analytics agent