Reconstruct ADLC pipeline state after a daily restart (4am systemd timer), server reboot, or manual session restart. Ensures all agents, services, and infrastructure are healthy before resuming autonomous operations.
Applies to the ADLC orchestrator session (ods-claude
tmux) on srv-agents (this machine). Covers context rebuild, interrupted
work recovery, resource validation, and pipeline loop start.
jniox_orbusdigital_com@odsgpcicd-srv-agents)ods-claude exists (created by systemd
ods-claude.service)127.0.0.1:5433 (ods-postgres
container)~/.env.adlc (SLACK_BOT_TOKEN,
COOLIFY_API_URL)~/.claude/agent-memory/pipeline/systemctl --user status ods-claude.service
systemctl --user status ods-restart.timerIf stopped:
systemctl --user start ods-claude.service
systemctl --user start ods-restart.timertmux attach -t ods-claudeThe boot skill executes these steps automatically:
Step 3a – Load agent memory:
for d in ~/.claude/agent-memory/*/; do
echo "=== $(basename "$d") ==="
head -20 "$d/MEMORY.md" 2>/dev/null
doneStep 3b – Load project progress:
for p in ~/dev/specs/*/gestion/progress.md; do
PROJECT=$(basename "$(dirname "$(dirname "$p")")")
echo "=== $PROJECT ==="
tail -20 "$p"
doneStep 3c – Check git state of all services:
for d in ~/dev/projects/*/; do
echo "--- $(basename "$d") ---"
cd "$d" && git log --oneline -3 2>/dev/null && git status -s 2>/dev/null
doneStep 3d – Detect interrupted work:
grep -l "^RUNNING" ~/dev/ops/outputs/*.status 2>/dev/nullFor each interrupted task: reset status to previous stable phase and re-queue.
Step 3e – System resource check:
free -h
df -h /home
docker ps --format "table {{.Names}}\t{{.Status}}" 2>/dev/null
pg_isready -h 127.0.0.1 -p 5433 2>/dev/null && echo "PostgreSQL OK" || echo "PostgreSQL DOWN"Step 3f – Validate status file integrity:
bash ~/dev/ops/adlc-v2/scripts/validate-status.shIf violations found:
bash ~/dev/ops/adlc-v2/scripts/validate-status.sh --fixfor svc in oid docstore pdf-engine notification-hub workflow-engine form-engine; do
code=$(curl -sf -o /dev/null -w "%{http_code}" "https://${svc}.staging.orbusdigital.com/health" 2>/dev/null || echo "000")
echo "$svc: $code"
done
code=$(curl -sf -o /dev/null -w "%{http_code}" "https://ods-dashboard.staging.orbusdigital.com/api/health" 2>/dev/null || echo "000")
echo "ods-dashboard: $code"/loop 5m /check-pipeline
Post to ADLC channel (C0AN0N8AUGZ) with: - Number of active projects and their phases - Any interrupted work that was re-queued - System resource status (RAM, disk) - Staging health status
tmux ls shows ods-claude session
runninggrep "RUNNING" ~/dev/ops/outputs/*.status returns no
stale entrieshead -5 ~/.claude/agent-memory/pipeline/state.mdawk '/MemAvailable/ {print int($2/1024)}' /proc/meminfoIf boot fails: 1. Check systemd journal:
journalctl --user -u ods-claude.service -n 50 2. Kill
orphan Claude processes:
pkill -f "claude" && sleep 5 3. Restart the
service: systemctl --user restart ods-claude.service 4. If
PostgreSQL is down: docker restart ods-postgres 5. If
Redpanda is down: docker restart redpanda
~/.claude/skills/boot/SKILL.md~/.claude/skills/check-pipeline/SKILL.md~/dev/ops/adlc-v2/scripts/dispatcher-v3.sh~/.claude/agent-memory/pipeline/state.md