mediawiki_dump_tools/test
Nathan TeBlunthuis 577ddc87f5 Add per-namespace resume support for partitioned parquet output.
- Implement per-namespace resume points (dict mapping namespace -> (pageid, revid))
  to correctly handle interleaved dump ordering in partitioned output
- Extract resume functionality to dedicated resume.py module
- Add graceful shutdown handling via shutdown_requested flag (CLI-level only)
- Use lazy ParquetWriter creation to avoid empty files on early exit
- Refactor writing logic to _write_batch() helper method
- Simplify control flow by replacing continue statements with should_write flag
2025-12-06 06:56:19 -08:00
..
baseline_output fix baseline output for new columns. 2025-12-02 19:22:08 -08:00
dumps added regex scanner v2's dump unit test file regextest.xml.bz2 2019-11-07 14:06:15 -06:00
test_diff_revisions add test files. 2025-07-07 11:29:10 -07:00
__init__.py Make tests runnable from anywhere 2025-05-27 13:40:57 -05:00
test_wiki_diff_matcher.py make wikiq memory efficient again via batch processing. 2025-07-15 15:20:17 -07:00
Wikiq_Unit_Test.py Add per-namespace resume support for partitioned parquet output. 2025-12-06 06:56:19 -08:00