ADR-001: Adopt Slack Socket Mode for ADLC and PDLC Orchestrators

Date: 2026-03-23
Status: accepted
Deciders: ODS Engineering (J. Niox, CTO)
Related:
- FIND-20260323-008 (DeerFlow by ByteDance — SuperAgent harness)
- FIND-20260323-009 (DeerFlow vs ADLC Architecture Comparison — Deep Dive)

Context

The ADLC and PDLC orchestrators currently integrate with Slack using a polling-based architecture:

Inbound messages: slack-bridge.sh polls the Slack conversations.history API every 30 seconds, parses new messages, and injects them into the Claude tmux session via tmux send-keys. When tmux is down, messages are written to a filesystem inbox (~/dev/ops/slack-inbox/) for the dispatcher to pick up on its next 5-minute cycle.
Outbound messages: All agents and the dispatcher post to Slack using direct curl calls to the chat.postMessage API endpoint, constructing JSON payloads inline in bash.
Dispatcher cycle: The dispatcher-v3.sh runs every 5 minutes via systemd timer. Combined with the 30-second polling interval of slack-bridge.sh, this introduces up to 5.5 minutes of latency between a human sending a Slack message and the system acting on it.

This architecture has several problems:

Latency: Human responses to blocker notifications (credentials added, infrastructure provisioned) take up to 5+ minutes to be processed, slowing down the pipeline.
Rate limits: Polling conversations.history every 30 seconds across multiple channels risks hitting Slack’s Tier 3 rate limits (50+ requests/minute) as we add more channels.
Fragility: The tmux injection pattern (tmux send-keys) is lossy — messages can be dropped if the session is busy, and special characters in message text cause parsing failures.
No typing indicators: The system cannot signal that it is processing a request, leaving humans uncertain whether their message was received.
Duplicate processing: Timestamp-based deduplication in slack-bridge.sh is brittle; edge cases around clock drift or rapid messages can cause duplicates or missed messages.

The DeerFlow vs ADLC comparison (FIND-20260323-009) identified Slack Socket Mode as a Phase 1 quick win (6-8 hours effort) with high ROI for improving human-in-the-loop responsiveness.

Decision

Replace the current webhook/polling-based Slack integration with Slack Socket Mode for both ADLC and PDLC orchestrators.

What changes

New component: A lightweight Node.js (or Python) Socket Mode client replaces slack-bridge.sh. This process maintains a persistent WebSocket connection to Slack and receives events in real time.
Inbound flow: Instead of polling conversations.history, the Socket Mode client receives message events instantly via WebSocket. It writes structured JSON messages to a Unix domain socket or named pipe that the dispatcher and Claude supervisor can read.
Outbound flow: Outbound posting continues to use the chat.postMessage Web API (this is unchanged). The Socket Mode client may optionally provide a local HTTP endpoint for agents to post through, adding retry logic and rate-limit handling.
Dispatcher integration: The dispatcher-v3.sh reads from the new message queue (Unix socket/pipe/directory) instead of relying on tmux injection. Messages arrive in structured JSON format, eliminating parsing fragility.
Acknowledgment: Socket Mode requires explicit acknowledgment of events within 3 seconds. The client acknowledges immediately upon receipt, then queues for processing.

Affected services

ADLC orchestrator (dispatcher-v3.sh, slack-bridge.sh)
PDLC orchestrator (dispatcher-pdlc.sh)
All agents that post to Slack (via shared slack_post() function)
Systemd service definitions for slack-bridge

Slack app changes required

Enable Socket Mode in the Slack app configuration
Generate an App-Level Token (xapp- prefix) with connections:write scope
The existing Bot Token (xoxb-) continues to be used for Web API calls
Add event subscriptions: message.channels, message.im, app_mention

Rollout plan

Phase 1 (4h): Build Socket Mode client, test locally against a dev Slack workspace
Phase 2 (2h): Modify dispatcher-v3.sh to read from the new message queue instead of tmux/inbox
Phase 3 (1h): Deploy, run in parallel with old slack-bridge.sh for 24h
Phase 4 (1h): Disable old slack-bridge.sh, remove systemd polling service

Consequences

Positive

Sub-second latency: Human messages reach the orchestrator in under 1 second instead of up to 5.5 minutes
No polling overhead: Eliminates 2 API calls every 30 seconds (2,880 calls/day per channel), removing rate-limit risk entirely
Structured messages: Events arrive as typed JSON from Slack’s API, eliminating the fragile text parsing in slack-bridge.sh
Reliable delivery: WebSocket with automatic reconnection is more reliable than timestamp-based polling with filesystem state
Typing indicators: Socket Mode supports sending typing indicators, improving human UX during long-running operations
Interactive messages: Enables future use of Slack Block Kit (buttons, modals, dropdowns) for human-in-the-loop workflows (approve/reject deployments, select options)
Alignment with DeerFlow roadmap: Implements Phase 1 of the incremental adoption plan from FIND-20260323-009

Negative

New runtime dependency: Adds a Node.js or Python process that must be kept running alongside the bash dispatcher
App-Level Token management: A new token type (xapp-) must be generated and secured in .env.adlc
Slack app reconfiguration: Requires changes to the Slack app settings (enable Socket Mode, add event subscriptions) — a one-time manual step
Complexity increase: The system goes from a single bash script to a multi-process architecture (Socket Mode client + dispatcher + Claude supervisor)

Neutral

Outbound posting (chat.postMessage via curl) remains unchanged
The dispatcher’s 5-minute systemd timer cycle remains for pipeline scanning; only Slack message ingestion becomes real-time
The SLACK_BOT_TOKEN continues to be used for all Web API calls; the new SLACK_APP_TOKEN is only for the WebSocket connection

Alternatives Considered

Alternative 1: Reduce polling interval to 5 seconds

Description: Keep slack-bridge.sh but poll every 5 seconds instead of 30
Pros: Zero code changes, immediate improvement
Cons: 17,280 API calls/day per channel, certain to hit Slack rate limits; still not truly real-time; does not fix tmux injection fragility
Why rejected: Trades one problem (latency) for another (rate limits). Does not address structural issues.

Alternative 2: Slack Events API with HTTP webhook

Description: Use Slack’s Events API to receive real-time events via HTTP POST to a public endpoint
Pros: Real-time delivery, well-documented API, no WebSocket management
Cons: Requires a publicly accessible HTTPS endpoint, which means either exposing a port through the firewall or using a tunnel (ngrok/Cloudflare Tunnel). Adds attack surface. Requires URL verification challenge handling.
Why rejected: Our orchestrators run on internal VPS nodes behind WireGuard VPN. Exposing an HTTP endpoint increases attack surface unnecessarily. Socket Mode achieves the same real-time delivery without requiring inbound network access.

Alternative 3: Keep current architecture, optimize dispatcher cycle

Description: Reduce dispatcher-v3.sh timer from 5 minutes to 1 minute and optimize slack-bridge.sh parsing
Pros: Minimal changes, stays within bash
Cons: Still polling-based, still fragile tmux injection, still unstructured text parsing
Why rejected: Addresses symptoms but not root causes. Socket Mode is the correct solution for real-time bidirectional Slack communication.

References

Innovation finding: FIND-20260323-008 (DeerFlow by ByteDance — SuperAgent harness)
Innovation finding: FIND-20260323-009 (DeerFlow vs ADLC Architecture Comparison — Deep Dive)
Slack Socket Mode documentation: https://api.slack.com/apis/socket-mode
Slack Bolt for JS: https://slack.dev/bolt-js/concepts#socket-mode
Slack Bolt for Python: https://slack.dev/bolt-python/concepts#socket-mode
Current implementation: ~/dev/ops/adlc-v2/scripts/slack-bridge.sh
Current implementation: ~/dev/ops/adlc-v2/scripts/dispatcher-v3.sh