- Implement per-namespace resume points (dict mapping namespace -> (pageid, revid)) to correctly handle interleaved dump ordering in partitioned output - Extract resume functionality to dedicated resume.py module - Add graceful shutdown handling via shutdown_requested flag (CLI-level only) - Use lazy ParquetWriter creation to avoid empty files on early exit - Refactor writing logic to _write_batch() helper method - Simplify control flow by replacing continue statements with should_write flag |
||
|---|---|---|
| .. | ||
| baseline_output | ||
| dumps | ||
| test_diff_revisions | ||
| __init__.py | ||
| test_wiki_diff_matcher.py | ||
| Wikiq_Unit_Test.py | ||