Veille System — Full Audit

ODS Innovation · Daily Technology Watch · 2026-04-07

The veille is ODS Platform's autonomous daily tech watch — scanning releases, CVEs, trending repos, and tools relevant to the stack. It runs at 07:00 UTC via systemd timer, producing structured JSON+HTML findings with Slack notifications and Google Drive sync. This audit covers 15 days of operation (Mar 23–Apr 7) across all system components.

87%
Days Producing
67%
Fully Correct
3
Critical Issues
~12
Avg Findings/Day
19
Deps Tracked
40
Repos Reviewed
1 — System Overview

How It Works

The veille system is a three-layer architecture: a bash dispatcher triggered by systemd injects commands into a Claude Code tmux session, which spawns a specialized veille agent. The agent uses WebSearch/WebFetch to scan 19 dependencies, GitHub advisories, and trending repos. Outputs are written deterministically via CLI tools (write-finding.sh and write-daily-summary.sh) that produce validated JSON and ODS-branded HTML. Results are synced to Google Drive and posted to Slack #Innovation.

2 — Architecture Flow

Daily Run & Ad-Hoc Flows

Ctrl/Cmd + wheel to zoom. Drag to pan. Double-click to fit.

Loading...
3 — Components Inventory

System Components

Component Path Role Health
Agent Definition~/.claude/agents/veille.mdAgent instructions (405 lines)OK
Dispatcher~/dev/ops/innovation/scripts/dispatcher-innovation.shDaily orchestrator (160 lines)OK
Slack Bridge~/dev/ops/innovation/scripts/innovation-slack-bridge.shSlack poller (30s interval)OK
write-finding.sh~/dev/ops/adlc-v2/scripts/cli/write-finding.shPer-finding JSON+HTML writerOK
write-daily-summary.sh~/dev/ops/adlc-v2/scripts/cli/write-daily-summary.shDaily aggregation JSON+HTMLWarnEmpty HTML on 2 days
SystemD Timer~/.config/systemd/user/ods-innovation.timerTrigger (daily 07:00 UTC)Active
SystemD Service~/.config/systemd/user/ods-innovation.serviceRuns dispatcherOK
Innovation Bridge Skill~/.claude/skills/innovation-bridge/SKILL.mdRoutes Slack commandsOK
last-versions.json~/dev/ops/innovation/last-versions.json19 tracked dependenciesOK
reviewed-repos.json~/dev/ops/innovation/reviewed-repos.json40 repos trackedOK
10 components audited8 OK · 1 Warn · 1 Active
4 — Operational Data (15 Days)

Daily Run History

Date Findings Summary JSON Summary HTML Status
2026-03-23YesYesYesDone
2026-03-24YesYesYesDone
2026-03-25YesYesYesDone
2026-03-26YesNoNoNo summary
2026-03-27YesYes0 bytesEmpty HTML
2026-03-28YesYesYesDone
2026-03-29YesYes0 bytesEmpty HTML
2026-03-30YesYesYesDone
2026-03-31YesYesYesDone
2026-04-0114YesYesDone
2026-04-0215YesYesDone
2026-04-0313YesYesDone
2026-04-0433YesYesDone
2026-04-050NoNoFailed
2026-04-0610YesYesDone
2026-04-070NoNoStalled 3h+
16 days tracked (Mar 23 – Apr 7)13 OK · 1 failed · 2 partial
5 — Critical Issues

Must Fix Immediately

C1 — CRITICAL
Today's run (Apr 7) stalled — RUNNING with no output for 3+ hours
  • Status file says RUNNING since 07:02:32 UTC, but no findings directory created
  • The dispatcher injected at 07:02, but the ods-claude tmux session was restarted at 10:07 (after injection). The command was lost.
  • Impact: Today's entire tech watch is missing
  • Fix: Add a watchdog — if veille status is RUNNING for >2 hours with no findings, post alert to Slack DM and re-trigger
C2 — CRITICAL
Apr 5 — complete failure, zero findings produced
  • Findings directory exists but is empty (0 findings). No daily summary generated.
  • Dispatcher log: "already triggered for 2026-04-05 — skipping" — a review status file existed before the dispatcher ran
  • Apr 6 dispatcher: "no summary found for 2026-04-05"
  • Root cause: Agent was triggered ad-hoc (status set to TRIGGERED), but never completed
  • Fix: Dispatcher should check for completion (not just trigger status). Distinguish TRIGGERED vs COMPLETED.
C3 — CRITICAL
Output directory sprawl — 4 different locations
Reports are scattered across 4 uncoordinated paths:
  • ~/dev/ops/innovation/findings/ + daily/ — canonical (from CLI tools)
  • ~/dev/ops/reports/veille/veille-YYYY-MM-DD.json/html
  • ~/dev/ops/reports/veille-YYYY-MM-DD.json/html — no subdirectory
  • ~/dev/ops/veille/reports/ — yet another schema
Impact: No single source of truth. Confusing for dashboards, agents, and humans.
Fix: Consolidate to ~/dev/ops/innovation/ only. Update agent definition and CLI tools.
6 — High Issues

Significant Problems

H1 — HIGH
Empty HTML summaries (Mar 27, Mar 29)
2026-03-27-summary.html and 2026-03-29-summary.html are 0 bytes. JSON summaries exist and are valid. Root cause: write-daily-summary.sh's Python HTML generation redirects stdout to file — if Python crashes, the file is created empty. Fix: Verify HTML >0 bytes after generation; regenerate if empty.
H2 — HIGH
No completion watchdog — fire-and-forget design
The dispatcher triggers the veille agent into tmux (fire-and-forget). No mechanism detects if the agent never started, crashed mid-run, or got stuck. This is the root cause of C1 and C2. Fix: Add a watchdog cron that checks every 30 minutes: if status == RUNNING for >90 minutes AND no findings exist, alert and re-trigger.
H3 — HIGH
Dispatcher references wrong agent path
Line 71 injects: Read ~/dev/ops/innovation/agents/veille.md. But the actual agent file is at ~/.claude/agents/veille.md. The Claude agent spawner reads from ~/.claude/agents/ automatically. Fix: Update dispatcher line 71 to remove the incorrect path reference.
7 — Medium & Low Issues

Improvements

M1: No cron fallback for systemd timer
Only systemd timer triggers the veille. If systemd user service manager fails, veille stops entirely. No cron entry exists. Fix: Add cron as backup: 5 7 * * * bash ~/dev/ops/innovation/scripts/dispatcher-innovation.sh
M2: Missing Mar 26 summary — never generated
Mar 26 has findings but NO summary (JSON or HTML). The summary was never generated. Fix: Run write-daily-summary.sh 2026-03-26 to backfill.
M3: Fragile Slack JSON construction
Both dispatcher and write-daily-summary.sh use inline Python for Slack JSON. Line 221 passes "$MSG" with unescaped newlines. Special characters in finding titles could break the JSON. Fix: Use python3 -c "import json; print(json.dumps(...))" consistently.
M4: Summary generated before veille agent runs
dispatcher-innovation.sh line 80 runs write-daily-summary.sh RIGHT AFTER triggering the veille agent, before findings are produced. The summary is based on old/no data. Fix: Remove line 80. Let the agent call it at the end of its run.
M5: Agent memory scattered across 8+ locations
.claude/agent-memory/veille/ exists under home, dev, and 4+ project directories with different MEMORY.md content. Agent learning doesn't consolidate. Fix: Centralize to ~/.claude/agent-memory/veille/MEMORY.md.
L1: No retry enforcement on GitHub API rate limits
Agent definition mentions "retry once on 403" but there's no enforcement mechanism. Acceptable risk — document in runbook.
L2: No GDrive sync validation
rclone exit code is checked but no validation that files arrived. Low priority — rclone is reliable.
L3: reviewed-repos.json has no schema validation
40 repos tracked with no validation of entry format. Potential for corrupted entries. Fix: Add JSON schema check when updating repo tracker.
8 — Reliability Metrics

15-Day Assessment

MetricValueAssessment
Days with findings produced13/15 (87%)Good
Days with complete summary (JSON+HTML)11/15 (73%)Needs work
Days with all outputs correct10/15 (67%)Needs work
SystemD timer activeYesGood
Avg findings per day~12Good
Dependencies tracked19Good
Repos reviewed40Good
Slack notificationsWorkingGood
GDrive syncWorkingGood
Duplicate output locations4 dirsBad
Watchdog / alertingNoneBad
11 metrics reviewed7 Good · 2 Warn · 2 Bad
9 — Recommended Fixes

Priority Order

  1. Add completion watchdog — cron every 30min that checks for stalled runs, alerts and re-triggers
  2. Consolidate output directories — single canonical location ~/dev/ops/innovation/
  3. Fix dispatcher line 80 — don't generate summary before agent runs
  4. Fix dispatcher line 71 — correct agent path reference
  5. Add empty-file detection — verify HTML >0 bytes in write-daily-summary.sh
  6. Add cron fallback — for systemd timer redundancy
  7. Backfill missing summaries — run write-daily-summary.sh for Mar 26
  8. Centralize agent memory — single ~/.claude/agent-memory/veille/MEMORY.md
  9. Harden Slack JSON — use json.dumps() consistently for all Slack payloads
10 — System Strengths

What Works Well

Well-designed agent definition with comprehensive 405-line instructions
Deterministic CLI tools with proper validation, enum checks, and ODS-branded HTML
Clean 3-layer separation: dispatcher (bash) → agent (Claude) → CLI tools (bash)
Good data schema — findings JSON includes security reviews, supply chain risk, license checks
Systemd timer is reliable, persistent, and handles missed runs
Bidirectional Slack integration (post summaries + receive ad-hoc URLs via bridge)
Google Drive sync provides offsite backup of all reports
19 dependencies tracked with version comparison against last-versions.json
40 repos reviewed with deduplication via reviewed-repos.json tracker