FIND-20260324-017 — RLM (Recursive Language Models) — Data Scientist Agent Embedded in Programs via DSPy SandboxSerializable

Kevin Madura (@kmad) shares his March 22 2026 blog post introducing a new pattern for embedding LLM-based data analysis agents directly into programs using Recursive Language Models (RLMs). The approach extends DSPy's SandboxSerializable protocol to expose DataFrames to a persistent REPL-based LLM loop — the model iterates, writes code, inspects results, and recurses until analysis is complete. Benchmarked on DABench (257 questions, 68 CSVs), Qwen 3.5 397B reaches 86.8% accuracy with 2.8 average iterations. The upstream library (alexzhang13/rlm, 3177 stars, MIT, Python) provides plug-and-play inference with Docker/Modal sandbox support.

Relevant to ODS Data Platform Zero-ETL layer and any future analytics automation within the platform. The RLM pattern — embedding a REPL-looping LLM into data workflows — could serve as the intelligence layer above ClickHouse for ad-hoc cohort analysis, anomaly detection, or report generation without custom ETL code. The DSPy SandboxSerializable protocol is directly applicable to any service that produces DataFrames from ClickHouse queries (Billing Engine analytics, Metabase supplement). The upstream RLM library (MIT) is production-ready with Docker sandbox isolation. Not an immediate P0/P1 dependency but a strong candidate for the P2 ClickHouse + Metabase phase.

RLM (Recursive Language Models) — Data Scientist Agent Embedded in Programs via DSPy SandboxSerializable

Source

ODS Impact

Security Review

Tags