Commit Graph

6 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
93f6ed0ff5 fix bug by truncating corrupted jsonl lines. 2025-12-23 19:52:37 -08:00
Nathan TeBlunthuis
3f1a9ba862 refactor and enable jsonl output. 2025-12-21 23:42:18 -08:00
Nathan TeBlunthuis
6988a281dc output parquet files in chunks to avoid memory issues with parquet. 2025-12-20 21:45:39 -08:00
Nathan TeBlunthuis
6a4bf81e1a add test for two wikiq jobs in the same directory. 2025-12-19 11:50:56 -08:00
Nathan TeBlunthuis
006feb795c fix interruption handling by breaking the diff loop. 2025-12-18 18:00:30 -08:00
Nathan TeBlunthuis
6b4f3939a5 more work on resuming. 2025-12-10 21:07:52 -08:00