Non-Compliant
1 criterion MISSING · 0 HIGH severity deviations · 1 MEDIUM deviation
29
Total Criteria
24
Met
1
Missing
0
Partial
4
N/A
Spec note — no spec.md found
No spec.md at /dev/specs/ods-platform/specs/redpanda/spec.md. Acceptance criteria derived from CLAUDE.md (service-specific), global CLAUDE.md, and ADR-001 through ADR-005. This is an Infrastructure service — a shared Rust library crate (ods-events). REST API, PostgreSQL schema, RLS, audit trail, soft-delete, and service binary Dockerfile checks are classified N/A.
Changes since last review (6f994515 → a0b74bd1)
Acceptance Criteria
Met Criterion fully satisfied
Missing Not implemented
Partial Incomplete
N/A Does not apply to infra service
ID Criterion Status Evidence
AC-001 CloudEvents v1.0 format with ODS extensions Met src/event.rsCloudEvent struct, specversion='1.0', tenantid + correlationid required fields. validate() enforces all rules. 13 unit tests.
AC-002 Async EventProducer publishing CloudEvents Met src/client/producer.rsEventProducer wraps rdkafka FutureProducer. emit() + emit_raw(). Subject as Kafka key for partition affinity. 5s default timeout. 6 unit tests.
AC-003 Async EventConsumer receiving CloudEvents Met src/client/consumer.rsEventConsumer wraps StreamConsumer. 1 MB payload guard, malformed-message skip + warn, stream-ended error propagation. 7 tests.
AC-004 Topic naming convention: events.{source} Met CloudEvent::topic() returns format!("events.{}", source). Verified by topic_follows_ods_convention test.
AC-005 20 ODS platform topics with correct retention/partition configs Met src/admin/topic.rs — 9 events (7d/3p/delete), 7 CDC (3d/1p/compact), 4 billing (30d/3p/delete). 13 unit tests verify counts, names, retention values, cleanup policies.
AC-006 Idempotent topic provisioning on cluster Met src/admin/provisioning.rsTopicProvisioner::provision() + provision_all(). Fetches existing topics, skips known ones. ProvisionResult::Created|AlreadyExists|Failed. validate_topic_configs() detects drift. 8 tests.
AC-007 Cluster health checks Met src/admin/health.rsHealthChecker::check() returns ClusterHealth with broker count and visible topics (internal _* filtered). topic_exists() helper. 4 tests.
AC-008 Metrics collection (cluster, consumer lag, watermarks) Met src/monitoring/metrics.rsMetricsCollector, ClusterMetrics, ConsumerGroupLag, PartitionLag. Watermark-delta message counting. 14 unit tests.
AC-009 Alert evaluation with configurable thresholds Met src/monitoring/alerting.rsAlertEvaluator with AlertThresholds (lag_warning:1000, lag_critical:10000, min_brokers:1, min_topics:20). evaluate_all() sorts critical-first. 17 tests including boundary conditions.
AC-010 Prometheus text exposition format export Met src/monitoring/prometheus.rsPrometheusExporter outputs HELP/TYPE/gauge lines for broker count, topic count, per-topic messages, per-partition watermarks. Configurable prefix. 10 tests.
AC-011 Webhook-based alert forwarding (Slack-compatible) Met src/monitoring/webhook.rsWebhookNotifier, WebhookPayloadBuilder::build_generic() + build_slack(). Sensitive headers (Authorization, X-Api-Key) redacted during serialization. 22 tests.
AC-012 Cross-cluster replication (ActivePassive / ActiveActive) Met src/replication/replicator.rsTopicReplicator::replicate_batch() consumes source, produces to destination. status() computes per-partition lag via watermark comparison. Mode: ActivePassive|ActiveActive.
AC-013 Topic filtering (All / Explicit / Prefix) Met src/replication/config.rsTopicFilter::All|Explicit(Vec)|Prefix(String). Internal _* topics always excluded. filter_topics() helper. 7 tests.
AC-014 Topic mapping (Identity / Prefixed) Met TopicMapping::Identity|Prefixed(String) with map_topic(). Empty-prefix edge case tested. 3 tests.
AC-015 Replication lag tracking across partitions Met src/replication/status.rsReplicationStatus + PartitionReplicationStatus with has_critical_lag(), max_lag(), topic_count(). 7 tests.
AC-016 Structured error handling (all failure modes) Met src/error.rsEventError with 6 variants via thiserror: Serialization, Send, Consume, InvalidEvent, Monitoring, Replication. 6 tests.
AC-017 TLS/SASL security config for broker connections Met src/security.rsSecurityConfig with Plaintext|Ssl|SaslPlaintext|SaslSsl, SCRAM-SHA-256/512, with_client_cert() for mTLS. apply() writes to rdkafka::ClientConfig. 9 tests.
AC-018 CI/CD pipeline (test + clippy + fmt + build + deploy) Met .github/workflows/ci-deploy.ymlcargo fmt --check, clippy -D warnings, cargo test, Docker push to registry.agirdigital.com, Coolify webhooks for dev/staging/prod.
AC-019 Redpanda single-node production deployment Met docker-compose.yml — Redpanda v24.3.1, VPN-bound ports on 10.0.0.3, 2 SMP, 4 GB memory, healthcheck via rpk cluster health. Schema Registry + Console included.
AC-020 Prometheus + Grafana observability stack Met docker-compose.ymlprom/prometheus:v2.51.2 (30-day retention), grafana/grafana:11.0.0. config/prometheus.yml scrapes /public_metrics + /metrics at 15s.
AC-021 Shell script for topic creation via rpk Met scripts/create-topics.sh — creates all 20 topics with correct configs via rpk. Idempotent. Accepts optional BROKER argument.
AC-022 Architecture Decision Records (ADR-001..005) Met docs/adr/ — ADR-001 (crate design), ADR-002 (topic provisioning), ADR-003 (monitoring), ADR-004 (replication), ADR-005 (third-party integrations). All Accepted.
AC-023 events.redpanda self-topic in registry Met Fixed in this commit. all_topics() now includes events.redpanda (9 events total, 20 overall). Test event_topic_names_match_spec() verifies its presence. Previously MISSING.
AC-024 Grafana admin password mandatory enforcement Met Fixed in this commit. docker-compose.yml line 83 now uses ${GRAFANA_ADMIN_PASSWORD:?...} — fails startup if unset. Previously PARTIAL (defaulted to 'admin').
AC-025 Integration tests with real Redpanda (testcontainers) Missing docker-compose.test.yml prepared but testcontainers absent from Cargo.toml. All 163 tests are unit-only. ADR-001 and ADR-004 explicitly defer this. Unchanged since prior review.
AC-026 PostgreSQL schema with RLS by tenant_id N/A Infrastructure service — shared library crate. No application persistence layer.
AC-027 REST API endpoints with JWT authentication N/A Infrastructure service — library crate with no HTTP server.
AC-028 Audit trail (who/when/what for mutations) N/A Infrastructure service — no mutable application state. Audit trail applies to consuming application services.
AC-029 Soft delete only N/A Infrastructure service — no database persistence layer.
29 criteria evaluated  ·  4 N/A (excluded from compliance) 24 MET  ·  1 MISSING  ·  0 PARTIAL
Deviations
MEDIUM Integration tests absent
All 163 tests are unit-only. No integration tests exercise real Redpanda broker interactions. docker-compose.test.yml is prepared but testcontainers is not a dependency and no tests/integration/ directory exists. ADR-001 and ADR-004 explicitly defer this as follow-up work.
Spec reference: Global CLAUDE.md — TDD: Integration tests: Real PostgreSQL + real Redpanda (testcontainers)
Review notes
This review covers commit a0b74bd1 (SPLITa refactor). The reorganization is a pure structural refactor — all 163 tests preserved, no functional regressions. Two deviations from the previous review (commit 6f994515) have been resolved: events.redpanda self-topic added and Grafana password enforcement hardened. Only one MISSING criterion remains: integration tests with real Redpanda. The library is functionally complete for its P0 infrastructure role. Status is non-compliant solely due to the absent integration test suite — addressable in a single follow-up task.