Manage ADLC subagents: spawning, monitoring, tracking results, handling failures, and applying the circuit breaker pattern. Ensures efficient resource usage and prevents runaway failures.
Applies to the ADLC orchestrator managing all subagent types: dev, ba, architect, security, devops, pr, deploy, e2e-test, scenario, auditor, resolver, provisioner.
ods-claude tmux session~/.claude/agents/~/dev/ops/adlc-v2/scripts/cli/Always check memory before spawning:
MEM=$(awk '/MemAvailable/ {print int($2/1024)}' /proc/meminfo)
echo "Available memory: ${MEM}MB"| Memory | Action |
|---|---|
| > 2000MB | Spawn freely, launch all pending work in parallel |
| 1000-2000MB | Queue new spawns, wait for running agents to finish |
| < 1000MB | Do NOT spawn, post CRITICAL to Slack DM |
| < 512MB | Kill no agents but queue ALL new spawns |
Critical rule: Do NOT artificially limit to 1-2 agents when RAM is available. Do NOT sleep between spawns. Check memory once, then launch everything.
Exception: Max 1 Rust cargo build at a
time (lesson from 2026-03-20 OOM crash). Each Rust compilation uses
500MB-1.5GB.
/agent dev "SERVICE: {service}. PROJECT: {project}. TASK: {task_id} -- {description}. Spec: ~/dev/specs/{project}/specs/{service}/spec.md. Focus ONLY on this task."
/agent ba "SERVICE: {service}. PROJECT: {project}. Review against spec. Write JSON to ~/dev/ops/reviews/{service}/ba-report.json."
/agent architect "SERVICE: {service}. PROJECT: {project}. Write JSON to ~/dev/ops/reviews/{service}/architect-report.json"
/agent security "SERVICE: {service}. PROJECT: {project}. Write JSON to ~/dev/ops/reviews/{service}/security-report.json"
/agent devops "SERVICE: {service}. PROJECT: {project}. MODE: review. Write JSON to ~/dev/ops/reviews/{service}/devops-report.json"
/agent pr "SERVICE: {service}. PROJECT: {project}. Reviews in ~/dev/ops/reviews/{service}/. Create PR and merge to staging."
/agent devops "SERVICE: {service}. PROJECT: {project}. MODE: deploy. Config: ~/dev/ops/coolify/{service}.json. Verify health check."
/agent scenario "SERVICE: {service}. PROJECT: {project}. Generate E2E scenarios. Write to ~/dev/projects/{service}/tests/e2e/"
Wait for completion, then:
/agent e2e-test "SERVICE: {service}. PROJECT: {project}. Execute E2E tests. Scenarios: ~/dev/projects/{service}/tests/e2e/"
/agent auditor "Audit all active projects. Check pipeline sequence, review quality, registry, test coverage. Write to ~/dev/ops/reviews/"
After spawning, agents write their outputs to: - Status
files: ~/dev/ops/outputs/{service}-{agent}.status
- JSON reports:
~/dev/ops/reviews/{service}/{agent}-report.json -
Pipeline state:
~/.claude/agent-memory/pipeline/state.md - Lessons
learned: ~/dev/ops/lessons-learned.md
Check for completion:
# Check all status files for a service
for f in ~/dev/ops/outputs/{service}-*.status; do
[ -f "$f" ] && echo "$(basename $f): $(head -1 $f)"
doneValidate status file integrity:
bash ~/dev/ops/adlc-v2/scripts/validate-status.sh| Status | Meaning | Next Action |
|---|---|---|
| DONE | Agent completed successfully | Advance pipeline |
| FAILED | Agent found issues | Read report, spawn fix |
| RUNNING | Agent still working | Wait, check again next cycle |
| BLOCKED | Agent cannot proceed | Analyze cause, escalate |
| BLOCKED_EXTERNAL | Missing external dep | Follow SOP-007 |
Read JSON reports to determine next steps:
python3 -c "
import json, glob
for f in glob.glob('$HOME/dev/ops/reviews/{service}/*-report.json'):
r = json.load(open(f))
name = f.split('/')[-1]
verdict = r.get('verdict', r.get('status', '?'))
print(f'{name}: {verdict}')
"Track retries per service per agent type:
RETRY_FILE="$HOME/.claude/agent-memory/pipeline/retries-{service}-{agent}.txt"
RETRIES=$(cat "$RETRY_FILE" 2>/dev/null || echo "0")On each failure:
RETRIES=$((RETRIES + 1))
echo "$RETRIES" > "$RETRY_FILE"
if [ "$RETRIES" -ge 3 ]; then
echo "CIRCUIT BREAKER: {service}/{agent} failed 3 times"
# Mark BLOCKED
CLI="$HOME/dev/ops/adlc-v2/scripts/cli"
bash $CLI/write-status.sh {service} {agent} BLOCKED "Circuit breaker: 3 failures"
# Post to Slack DM immediately
source ~/.env.adlc
curl -sf -X POST "https://slack.com/api/chat.postMessage" \
-H "Authorization: Bearer $SLACK_BOT_TOKEN" \
-H "Content-Type: application/json" \
-d "$(python3 -c "
import json
print(json.dumps({
'channel': 'D0AGRAVEC1K',
'text': ':rotating_light: CIRCUIT BREAKER -- {service}/{agent} failed 3 times\nLast error: {error_description}\nAction needed: Manual investigation required'
}))
")"
fiOn success: Reset the retry counter:
echo "0" > "$RETRY_FILE"| Pattern | Cause | Fix |
|---|---|---|
| Agent writes JSON to status file | Agent captured API response and wrote it raw | Use CLI tools only (lesson from 2026-03-21) |
| Agent invents status keywords | Agent uses SECURITY_PASS, SCENARIOS_READY, etc. | Use validate-status.sh –fix (lesson from 2026-03-23) |
| Agent cannot find spec | Wrong path convention | Glob for spec: ~/dev/specs/**/*spec* (lesson from
2026-03-23) |
| Agent OOM killed | Concurrent Rust builds | Max 1 Rust build at a time (lesson from 2026-03-20) |
| BA marks pending tasks as MISSING | BA doesn’t know which tasks are pending | Include completed task list in BA agent prompt |
| Dev agent leaves stub module | Stub not replaced when real module added | Agent prompt must include “remove stub, update all imports” (lesson from 2026-03-23) |
sleep
calls between agent launches.All agents MUST use CLI tools for output:
CLI="$HOME/dev/ops/adlc-v2/scripts/cli"
bash $CLI/write-status.sh {service} {agent} {STATUS} "{details}"
bash $CLI/write-review.sh {service} {agent_type} < report.json
bash $CLI/write-pipeline-state.sh {project} {service} {STATE} "{details}"
bash $CLI/write-lesson.sh {agent} {service} "{problem}" "{root_cause}" "{fix}" "{prevention}"NEVER write these files directly with Write or Edit tool – the CLI validates format and rejects invalid input.
ls ~/dev/ops/outputs/{service}-*.statusbash ~/dev/ops/adlc-v2/scripts/validate-status.shcat ~/.claude/agent-memory/pipeline/retries-*.txt 2>/dev/nullawk '/MemAvailable/ {print int($2/1024)}' /proc/meminfo
> 2000MBIf an agent corrupts state: 1. Validate and fix status files:
bash ~/dev/ops/adlc-v2/scripts/validate-status.sh --fix 2.
Reset retry counter:
echo "0" > ~/.claude/agent-memory/pipeline/retries-{service}-{agent}.txt
3. Clear stale RUNNING status: re-run the agent or manually set to
previous stable state 4. If agent corrupted source code:
cd ~/dev/projects/{service} && git checkout dev -- .
~/dev/ops/adlc-v2/scripts/cli/~/.claude/agents/~/dev/ops/adlc-v2/scripts/validate-status.sh~/dev/ops/adlc-v2/scripts/dispatcher-v3.sh
(write_status helper)~/.claude/skills/check-pipeline/SKILL.md