Veille System — Full Audit

ODS Innovation · Daily Technology Watch · 2026-04-07

The veille is ODS Platform's autonomous daily tech watch — scanning releases, CVEs, trending repos, and tools relevant to the stack. It runs at 07:00 UTC via systemd timer, producing structured JSON+HTML findings with Slack notifications and Google Drive sync. This audit covers 15 days of operation (Mar 23–Apr 7) across all system components.

87%

Days Producing

67%

Fully Correct

Critical Issues

~12

Avg Findings/Day

Deps Tracked

Repos Reviewed

1 — System Overview

How It Works

The veille system is a three-layer architecture: a bash dispatcher triggered by systemd injects commands into a Claude Code tmux session, which spawns a specialized veille agent. The agent uses WebSearch/WebFetch to scan 19 dependencies, GitHub advisories, and trending repos. Outputs are written deterministically via CLI tools (write-finding.sh and write-daily-summary.sh) that produce validated JSON and ODS-branded HTML. Results are synced to Google Drive and posted to Slack #Innovation.

2 — Architecture Flow

Daily Run & Ad-Hoc Flows

Ctrl/Cmd + wheel to zoom. Drag to pan. Double-click to fit.

3 — Components Inventory

System Components

Component	Path	Role	Health
Agent Definition	`~/.claude/agents/veille.md`	Agent instructions (405 lines)	OK
Dispatcher	`~/dev/ops/innovation/scripts/dispatcher-innovation.sh`	Daily orchestrator (160 lines)	OK
Slack Bridge	`~/dev/ops/innovation/scripts/innovation-slack-bridge.sh`	Slack poller (30s interval)	OK
write-finding.sh	`~/dev/ops/adlc-v2/scripts/cli/write-finding.sh`	Per-finding JSON+HTML writer	OK
write-daily-summary.sh	`~/dev/ops/adlc-v2/scripts/cli/write-daily-summary.sh`	Daily aggregation JSON+HTML	WarnEmpty HTML on 2 days
SystemD Timer	`~/.config/systemd/user/ods-innovation.timer`	Trigger (daily 07:00 UTC)	Active
SystemD Service	`~/.config/systemd/user/ods-innovation.service`	Runs dispatcher	OK
Innovation Bridge Skill	`~/.claude/skills/innovation-bridge/SKILL.md`	Routes Slack commands	OK
last-versions.json	`~/dev/ops/innovation/last-versions.json`	19 tracked dependencies	OK
reviewed-repos.json	`~/dev/ops/innovation/reviewed-repos.json`	40 repos tracked	OK
10 components audited			8 OK · 1 Warn · 1 Active

4 — Operational Data (15 Days)

Daily Run History

Date	Findings	Summary JSON	Summary HTML	Status
2026-03-23	Yes	Yes	Yes	Done
2026-03-24	Yes	Yes	Yes	Done
2026-03-25	Yes	Yes	Yes	Done
2026-03-26	Yes	No	No	No summary
2026-03-27	Yes	Yes	0 bytes	Empty HTML
2026-03-28	Yes	Yes	Yes	Done
2026-03-29	Yes	Yes	0 bytes	Empty HTML
2026-03-30	Yes	Yes	Yes	Done
2026-03-31	Yes	Yes	Yes	Done
2026-04-01	14	Yes	Yes	Done
2026-04-02	15	Yes	Yes	Done
2026-04-03	13	Yes	Yes	Done
2026-04-04	33	Yes	Yes	Done
2026-04-05	0	No	No	Failed
2026-04-06	10	Yes	Yes	Done
2026-04-07	0	No	No	Stalled 3h+
16 days tracked (Mar 23 – Apr 7)			13 OK · 1 failed · 2 partial

5 — Critical Issues

Must Fix Immediately

C1 — CRITICAL

Today's run (Apr 7) stalled — RUNNING with no output for 3+ hours

Status file says RUNNING since 07:02:32 UTC, but no findings directory created
The dispatcher injected at 07:02, but the ods-claude tmux session was restarted at 10:07 (after injection). The command was lost.
Impact: Today's entire tech watch is missing
Fix: Add a watchdog — if veille status is RUNNING for >2 hours with no findings, post alert to Slack DM and re-trigger

C2 — CRITICAL

Apr 5 — complete failure, zero findings produced

Findings directory exists but is empty (0 findings). No daily summary generated.
Dispatcher log: "already triggered for 2026-04-05 — skipping" — a review status file existed before the dispatcher ran
Apr 6 dispatcher: "no summary found for 2026-04-05"
Root cause: Agent was triggered ad-hoc (status set to TRIGGERED), but never completed
Fix: Dispatcher should check for completion (not just trigger status). Distinguish TRIGGERED vs COMPLETED.

C3 — CRITICAL

Output directory sprawl — 4 different locations

Reports are scattered across 4 uncoordinated paths:

~/dev/ops/innovation/findings/ + daily/ — canonical (from CLI tools)
~/dev/ops/reports/veille/ — veille-YYYY-MM-DD.json/html
~/dev/ops/reports/veille-YYYY-MM-DD.json/html — no subdirectory
~/dev/ops/veille/reports/ — yet another schema

Impact: No single source of truth. Confusing for dashboards, agents, and humans.
Fix: Consolidate to ~/dev/ops/innovation/ only. Update agent definition and CLI tools.

6 — High Issues

Significant Problems

H1 — HIGH

Empty HTML summaries (Mar 27, Mar 29)

2026-03-27-summary.html and 2026-03-29-summary.html are 0 bytes. JSON summaries exist and are valid. Root cause: write-daily-summary.sh's Python HTML generation redirects stdout to file — if Python crashes, the file is created empty. Fix: Verify HTML >0 bytes after generation; regenerate if empty.

H2 — HIGH

No completion watchdog — fire-and-forget design

The dispatcher triggers the veille agent into tmux (fire-and-forget). No mechanism detects if the agent never started, crashed mid-run, or got stuck. This is the root cause of C1 and C2. Fix: Add a watchdog cron that checks every 30 minutes: if status == RUNNING for >90 minutes AND no findings exist, alert and re-trigger.

H3 — HIGH

Dispatcher references wrong agent path

Line 71 injects: Read ~/dev/ops/innovation/agents/veille.md. But the actual agent file is at ~/.claude/agents/veille.md. The Claude agent spawner reads from ~/.claude/agents/ automatically. Fix: Update dispatcher line 71 to remove the incorrect path reference.

7 — Medium & Low Issues

Improvements

M1: No cron fallback for systemd timer

Only systemd timer triggers the veille. If systemd user service manager fails, veille stops entirely. No cron entry exists. Fix: Add cron as backup: 5 7 * * * bash ~/dev/ops/innovation/scripts/dispatcher-innovation.sh

M2: Missing Mar 26 summary — never generated

Mar 26 has findings but NO summary (JSON or HTML). The summary was never generated. Fix: Run write-daily-summary.sh 2026-03-26 to backfill.

M3: Fragile Slack JSON construction

Both dispatcher and write-daily-summary.sh use inline Python for Slack JSON. Line 221 passes "$MSG" with unescaped newlines. Special characters in finding titles could break the JSON. Fix: Use python3 -c "import json; print(json.dumps(...))" consistently.

M4: Summary generated before veille agent runs

dispatcher-innovation.sh line 80 runs write-daily-summary.sh RIGHT AFTER triggering the veille agent, before findings are produced. The summary is based on old/no data. Fix: Remove line 80. Let the agent call it at the end of its run.

M5: Agent memory scattered across 8+ locations

.claude/agent-memory/veille/ exists under home, dev, and 4+ project directories with different MEMORY.md content. Agent learning doesn't consolidate. Fix: Centralize to ~/.claude/agent-memory/veille/MEMORY.md.

L1: No retry enforcement on GitHub API rate limits

Agent definition mentions "retry once on 403" but there's no enforcement mechanism. Acceptable risk — document in runbook.

L2: No GDrive sync validation

rclone exit code is checked but no validation that files arrived. Low priority — rclone is reliable.

L3: reviewed-repos.json has no schema validation

40 repos tracked with no validation of entry format. Potential for corrupted entries. Fix: Add JSON schema check when updating repo tracker.

8 — Reliability Metrics

15-Day Assessment

Metric	Value	Assessment
Days with findings produced	13/15 (87%)	Good
Days with complete summary (JSON+HTML)	11/15 (73%)	Needs work
Days with all outputs correct	10/15 (67%)	Needs work
SystemD timer active	Yes	Good
Avg findings per day	~12	Good
Dependencies tracked	19	Good
Repos reviewed	40	Good
Slack notifications	Working	Good
GDrive sync	Working	Good
Duplicate output locations	4 dirs	Bad
Watchdog / alerting	None	Bad
11 metrics reviewed		7 Good · 2 Warn · 2 Bad

9 — Recommended Fixes

Priority Order

Add completion watchdog — cron every 30min that checks for stalled runs, alerts and re-triggers
Consolidate output directories — single canonical location ~/dev/ops/innovation/
Fix dispatcher line 80 — don't generate summary before agent runs
Fix dispatcher line 71 — correct agent path reference
Add empty-file detection — verify HTML >0 bytes in write-daily-summary.sh
Add cron fallback — for systemd timer redundancy
Backfill missing summaries — run write-daily-summary.sh for Mar 26
Centralize agent memory — single ~/.claude/agent-memory/veille/MEMORY.md
Harden Slack JSON — use json.dumps() consistently for all Slack payloads

10 — System Strengths

What Works Well

Well-designed agent definition with comprehensive 405-line instructions

Deterministic CLI tools with proper validation, enum checks, and ODS-branded HTML

Clean 3-layer separation: dispatcher (bash) → agent (Claude) → CLI tools (bash)

Good data schema — findings JSON includes security reviews, supply chain risk, license checks

Systemd timer is reliable, persistent, and handles missed runs

Bidirectional Slack integration (post summaries + receive ad-hoc URLs via bridge)

Google Drive sync provides offsite backup of all reports

19 dependencies tracked with version comparison against last-versions.json

40 repos reviewed with deduplication via reviewed-repos.json tracker