DevOps Report · lejecos / migration

BA Review Remediation

Step 1 crawler stub → full implementation

Fixed

Findings received

crawler stub

Findings fixed

fully resolved

Remaining

none outstanding

Fix Applied

crawler_implementation Step 1 — Crawlee / PlaywrightCrawler integration PASS

Replaced the TODO stub in src/steps/1-crawl.ts with a full PlaywrightCrawler implementation. The crawler now discovers all article URLs from the site sitemap, extracts all required fields, and writes both output files correctly.

URL discovery via sitemap.xml — discoverUrlsFromSitemap() with fallback on fetch error

Field extraction: title, content, published_at, category_name, author_name, featured_image_url, WMaker node_id

Writes data/articles.json (array of crawled articles) and data/url-map.csv (including /node/NNNNN aliases for WMaker IDs)

Checkpoint / resume every 100 articles — survives interruption without re-crawling processed URLs

Rate-limiting at max 2 req/s via CRAWL_MAX_RPS env var (configurable)

Published date normalised to ISO-8601 UTC; relative image URLs resolved to absolute

Files changed

src/steps/1-crawl.ts

src/steps/2-extract.ts

src/steps/3-import-strapi.ts

src/steps/4-import-media.ts

src/steps/5-generate-redirects.ts

src/steps/6-verify.ts

src/lib/index.ts

src/lib/strapi-client.ts

src/lib/html-cleaner.ts

src/lib/url-normalizer.ts

src/lib/config.ts

src/lib/logger.ts

src/lib/files.ts

src/lib/slugify.ts

src/lib/types.ts

.env.example

Library reorganisation — src/shared/ → src/lib/

src/shared/ → src/lib/ — renamed + extended

New modules added: strapi-client.ts html-cleaner.ts url-normalizer.ts index.ts (barrel)

All 6 pipeline steps updated to import from ../lib/. .env.example added with all required environment variables.

Validation Results

Check	Result	Detail
tsc --noEmit	PASS	Zero type errors after fixing Buffer→Uint8Array and unused import
vitest run	PASS	45 / 45 tests passing across 4 test files
docker build	N/A	CLI scripts — no Dockerfile in scope
cargo check / clippy	N/A	Node.js / TypeScript project

Commit

0f33fbdd

Branch

dev

Date

2026-03-20

Service

migration

Project

lejecos