ADR-001:
Adopt Slack Socket Mode for ADLC and PDLC Orchestrators
- Date: 2026-03-23
- Status: accepted
- Deciders: ODS Engineering (J. Niox, CTO)
- Related:
- FIND-20260323-008 (DeerFlow by ByteDance — SuperAgent harness)
- FIND-20260323-009 (DeerFlow vs ADLC Architecture Comparison — Deep
Dive)
Context
The ADLC and PDLC orchestrators currently integrate with Slack using
a polling-based architecture:
Inbound messages: slack-bridge.sh
polls the Slack conversations.history API every 30 seconds, parses new
messages, and injects them into the Claude tmux session via
tmux send-keys. When tmux is down, messages are written to
a filesystem inbox (~/dev/ops/slack-inbox/) for the
dispatcher to pick up on its next 5-minute cycle.
Outbound messages: All agents and the dispatcher
post to Slack using direct curl calls to the
chat.postMessage API endpoint, constructing JSON payloads
inline in bash.
Dispatcher cycle: The
dispatcher-v3.sh runs every 5 minutes via systemd timer.
Combined with the 30-second polling interval of
slack-bridge.sh, this introduces up to 5.5 minutes of
latency between a human sending a Slack message and the system acting on
it.
This architecture has several problems:
- Latency: Human responses to blocker notifications
(credentials added, infrastructure provisioned) take up to 5+ minutes to
be processed, slowing down the pipeline.
- Rate limits: Polling
conversations.history every 30 seconds across multiple
channels risks hitting Slack’s Tier 3 rate limits (50+ requests/minute)
as we add more channels.
- Fragility: The tmux injection pattern
(
tmux send-keys) is lossy — messages can be dropped if the
session is busy, and special characters in message text cause parsing
failures.
- No typing indicators: The system cannot signal that
it is processing a request, leaving humans uncertain whether their
message was received.
- Duplicate processing: Timestamp-based deduplication
in
slack-bridge.sh is brittle; edge cases around clock
drift or rapid messages can cause duplicates or missed messages.
The DeerFlow vs ADLC comparison (FIND-20260323-009) identified Slack
Socket Mode as a Phase 1 quick win (6-8 hours effort) with high ROI for
improving human-in-the-loop responsiveness.
Decision
Replace the current webhook/polling-based Slack integration with
Slack Socket Mode for both ADLC and PDLC
orchestrators.
What changes
New component: A lightweight Node.js (or Python)
Socket Mode client replaces slack-bridge.sh. This process
maintains a persistent WebSocket connection to Slack and receives events
in real time.
Inbound flow: Instead of polling
conversations.history, the Socket Mode client receives
message events instantly via WebSocket. It writes
structured JSON messages to a Unix domain socket or named pipe that the
dispatcher and Claude supervisor can read.
Outbound flow: Outbound posting continues to use
the chat.postMessage Web API (this is unchanged). The
Socket Mode client may optionally provide a local HTTP endpoint for
agents to post through, adding retry logic and rate-limit
handling.
Dispatcher integration: The dispatcher-v3.sh
reads from the new message queue (Unix socket/pipe/directory) instead of
relying on tmux injection. Messages arrive in structured JSON format,
eliminating parsing fragility.
Acknowledgment: Socket Mode requires explicit
acknowledgment of events within 3 seconds. The client acknowledges
immediately upon receipt, then queues for processing.
Affected services
- ADLC orchestrator (
dispatcher-v3.sh,
slack-bridge.sh)
- PDLC orchestrator (
dispatcher-pdlc.sh)
- All agents that post to Slack (via shared
slack_post()
function)
- Systemd service definitions for slack-bridge
Slack app changes required
- Enable Socket Mode in the Slack app configuration
- Generate an App-Level Token (
xapp-
prefix) with connections:write scope
- The existing Bot Token (
xoxb-) continues to be used for
Web API calls
- Add event subscriptions:
message.channels,
message.im, app_mention
Rollout plan
- Phase 1 (4h): Build Socket Mode client, test
locally against a dev Slack workspace
- Phase 2 (2h): Modify dispatcher-v3.sh to read from
the new message queue instead of tmux/inbox
- Phase 3 (1h): Deploy, run in parallel with old
slack-bridge.sh for 24h
- Phase 4 (1h): Disable old slack-bridge.sh, remove
systemd polling service
Consequences
Positive
- Sub-second latency: Human messages reach the
orchestrator in under 1 second instead of up to 5.5 minutes
- No polling overhead: Eliminates 2 API calls every
30 seconds (2,880 calls/day per channel), removing rate-limit risk
entirely
- Structured messages: Events arrive as typed JSON
from Slack’s API, eliminating the fragile text parsing in
slack-bridge.sh
- Reliable delivery: WebSocket with automatic
reconnection is more reliable than timestamp-based polling with
filesystem state
- Typing indicators: Socket Mode supports sending
typing indicators, improving human UX during long-running
operations
- Interactive messages: Enables future use of Slack
Block Kit (buttons, modals, dropdowns) for human-in-the-loop workflows
(approve/reject deployments, select options)
- Alignment with DeerFlow roadmap: Implements Phase 1
of the incremental adoption plan from FIND-20260323-009
Negative
- New runtime dependency: Adds a Node.js or Python
process that must be kept running alongside the bash dispatcher
- App-Level Token management: A new token type
(
xapp-) must be generated and secured in
.env.adlc
- Slack app reconfiguration: Requires changes to the
Slack app settings (enable Socket Mode, add event subscriptions) — a
one-time manual step
- Complexity increase: The system goes from a single
bash script to a multi-process architecture (Socket Mode client +
dispatcher + Claude supervisor)
Neutral
- Outbound posting (
chat.postMessage via curl) remains
unchanged
- The dispatcher’s 5-minute systemd timer cycle remains for pipeline
scanning; only Slack message ingestion becomes real-time
- The
SLACK_BOT_TOKEN continues to be used for all Web
API calls; the new SLACK_APP_TOKEN is only for the
WebSocket connection
Alternatives Considered
Alternative
1: Reduce polling interval to 5 seconds
- Description: Keep
slack-bridge.sh but
poll every 5 seconds instead of 30
- Pros: Zero code changes, immediate improvement
- Cons: 17,280 API calls/day per channel, certain to
hit Slack rate limits; still not truly real-time; does not fix tmux
injection fragility
- Why rejected: Trades one problem (latency) for
another (rate limits). Does not address structural issues.
Alternative 2:
Slack Events API with HTTP webhook
- Description: Use Slack’s Events API to receive
real-time events via HTTP POST to a public endpoint
- Pros: Real-time delivery, well-documented API, no
WebSocket management
- Cons: Requires a publicly accessible HTTPS
endpoint, which means either exposing a port through the firewall or
using a tunnel (ngrok/Cloudflare Tunnel). Adds attack surface. Requires
URL verification challenge handling.
- Why rejected: Our orchestrators run on internal VPS
nodes behind WireGuard VPN. Exposing an HTTP endpoint increases attack
surface unnecessarily. Socket Mode achieves the same real-time delivery
without requiring inbound network access.
Alternative
3: Keep current architecture, optimize dispatcher cycle
- Description: Reduce dispatcher-v3.sh timer from 5
minutes to 1 minute and optimize slack-bridge.sh parsing
- Pros: Minimal changes, stays within bash
- Cons: Still polling-based, still fragile tmux
injection, still unstructured text parsing
- Why rejected: Addresses symptoms but not root
causes. Socket Mode is the correct solution for real-time bidirectional
Slack communication.
References
- Innovation finding: FIND-20260323-008 (DeerFlow by ByteDance —
SuperAgent harness)
- Innovation finding: FIND-20260323-009 (DeerFlow vs ADLC Architecture
Comparison — Deep Dive)
- Slack Socket Mode documentation:
https://api.slack.com/apis/socket-mode
- Slack Bolt for JS:
https://slack.dev/bolt-js/concepts#socket-mode
- Slack Bolt for Python:
https://slack.dev/bolt-python/concepts#socket-mode
- Current implementation:
~/dev/ops/adlc-v2/scripts/slack-bridge.sh
- Current implementation:
~/dev/ops/adlc-v2/scripts/dispatcher-v3.sh