DevOps Report · lejecos / migration

BA Review Remediation

Step 1 crawler stub → full implementation
Fixed
Findings received
1
crawler stub
Findings fixed
1
fully resolved
Remaining
0
none outstanding
crawler_implementation Step 1 — Crawlee / PlaywrightCrawler integration PASS

Replaced the TODO stub in src/steps/1-crawl.ts with a full PlaywrightCrawler implementation. The crawler now discovers all article URLs from the site sitemap, extracts all required fields, and writes both output files correctly.

URL discovery via sitemap.xmldiscoverUrlsFromSitemap() with fallback on fetch error
Field extraction: title, content, published_at, category_name, author_name, featured_image_url, WMaker node_id
Writes data/articles.json (array of crawled articles) and data/url-map.csv (including /node/NNNNN aliases for WMaker IDs)
Checkpoint / resume every 100 articles — survives interruption without re-crawling processed URLs
Rate-limiting at max 2 req/s via CRAWL_MAX_RPS env var (configurable)
Published date normalised to ISO-8601 UTC; relative image URLs resolved to absolute
Files changed
src/steps/1-crawl.ts
src/steps/2-extract.ts
src/steps/3-import-strapi.ts
src/steps/4-import-media.ts
src/steps/5-generate-redirects.ts
src/steps/6-verify.ts
src/lib/index.ts
src/lib/strapi-client.ts
src/lib/html-cleaner.ts
src/lib/url-normalizer.ts
src/lib/config.ts
src/lib/logger.ts
src/lib/files.ts
src/lib/slugify.ts
src/lib/types.ts
.env.example
Library reorganisation — src/shared/ → src/lib/
src/shared/ src/lib/ — renamed + extended
New modules added: strapi-client.ts html-cleaner.ts url-normalizer.ts index.ts (barrel)
All 6 pipeline steps updated to import from ../lib/. .env.example added with all required environment variables.
Check Result Detail
tsc --noEmit
PASS
Zero type errors after fixing Buffer→Uint8Array and unused import
vitest run
PASS
45 / 45 tests passing across 4 test files
docker build
N/A
CLI scripts — no Dockerfile in scope
cargo check / clippy
N/A
Node.js / TypeScript project