Commit Graph

7 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
c3d31b4ab5 handle case when we have a valid resume file, but a corrupted original. 2025-12-10 20:33:04 -08:00
Nathan TeBlunthuis
f4a9491ff2 improve print debugging. 2025-12-10 19:50:47 -08:00
Nathan TeBlunthuis
c6e96c2f54 try/catch opening original file in resume. 2025-12-10 19:49:29 -08:00
Nathan TeBlunthuis
f427291fd8 add logic for resuming after a resume. 2025-12-10 19:26:54 -08:00
Nathan TeBlunthuis
d1fc094c96 don't put checkpoint files inside namespace directories. 2025-12-07 06:24:04 -08:00
Nathan TeBlunthuis
783f5fd8bc improve resume logic. 2025-12-07 06:06:26 -08:00
Nathan TeBlunthuis
577ddc87f5 Add per-namespace resume support for partitioned parquet output.
- Implement per-namespace resume points (dict mapping namespace -> (pageid, revid))
  to correctly handle interleaved dump ordering in partitioned output
- Extract resume functionality to dedicated resume.py module
- Add graceful shutdown handling via shutdown_requested flag (CLI-level only)
- Use lazy ParquetWriter creation to avoid empty files on early exit
- Refactor writing logic to _write_batch() helper method
- Simplify control flow by replacing continue statements with should_write flag
2025-12-06 06:56:19 -08:00