FIND-20260324-017 · 2026-03-24 · Innovation Veille

RLM (Recursive Language Models) — Data Scientist Agent Embedded in Programs via DSPy SandboxSerializable

tool MEDIUM
Kevin Madura (@kmad) shares his March 22 2026 blog post introducing a new pattern for embedding LLM-based data analysis agents directly into programs using Recursive Language Models (RLMs). The approach extends DSPy's SandboxSerializable protocol to expose DataFrames to a persistent REPL-based LLM loop — the model iterates, writes code, inspects results, and recurses until analysis is complete. Benchmarked on DABench (257 questions, 68 CSVs), Qwen 3.5 397B reaches 86.8% accuracy with 2.8 average iterations. The upstream library (alexzhang13/rlm, 3177 stars, MIT, Python) provides plug-and-play inference with Docker/Modal sandbox support.

Source

https://x.com/kmad/status/2035790703180005507?s=46

ODS Impact

Relevant to ODS Data Platform Zero-ETL layer and any future analytics automation within the platform. The RLM pattern — embedding a REPL-looping LLM into data workflows — could serve as the intelligence layer above ClickHouse for ad-hoc cohort analysis, anomaly detection, or report generation without custom ETL code. The DSPy SandboxSerializable protocol is directly applicable to any service that produces DataFrames from ClickHouse queries (Billing Engine analytics, Metabase supplement). The upstream RLM library (MIT) is production-ready with Docker sandbox isolation. Not an immediate P0/P1 dependency but a strong candidate for the P2 ClickHouse + Metabase phase.

Security Review

License: MIT | Maintenance: ACTIVE | Risk: MEDIUM | Recommendation: USE_WITH_CAUTION

Tags

ai-agents rlm dspy llm data-analysis clickhouse python sandbox repl adhoc analytics zero-etl