Commit Graph

18 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
d20075b323 add memray for debugging memory usage. 2025-07-17 15:17:23 -07:00
Nathan TeBlunthuis
37d095199a inc version. 2025-07-15 15:37:55 -07:00
Nathan TeBlunthuis
507335941d Revert "Merge branch 'compute-diffs' into HEAD"
This reverts commit 907a35323e, reversing
changes made to c40506137b.
2025-07-15 15:23:50 -07:00
Nathan TeBlunthuis
e53e7ada5d try fixing the memory problem. 2025-07-14 18:58:27 -07:00
Nathan TeBlunthuis
c597a6b7f4 refactor into src-layout package. 2025-07-07 20:14:13 -07:00
Nathan TeBlunthuis
14e819e565 compare pywikidiff2 to making requests to wikidiff2. 2025-07-07 10:51:11 -07:00
Nathan TeBlunthuis
4654911533 almost there. working out edge cases. 2025-07-03 21:32:44 -07:00
Nathan TeBlunthuis
5a3e4102b5 got wikidiff2 persistence working except for paragraph moves. 2025-06-30 15:37:54 -07:00
Will Beason
4bbed4a196 Merge branch 'parquet_support' into test-parquet 2025-06-17 12:20:19 -05:00
Will Beason
390499dd90 Pin to python 3.9
Since our execution environment requires this

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-06-17 11:37:20 -05:00
Nathan TeBlunthuis
f39ceefa4a Merge branch 'parquet_support' of gitea:collective/mediawiki_dump_tools into parquet_support 2025-05-29 18:05:28 -07:00
Nathan TeBlunthuis
bd22d26291 update deps and add edit_summary to wikiq output. 2025-05-29 18:02:14 -07:00
Nathan TeBlunthuis
22d14dc5f2 Remove dependency on pytest. 2025-05-28 21:54:31 -07:00
Nathan TeBlunthuis
5a10f59dc4 Merge branch 'parquet_support' of gitea:collective/mediawiki_dump_tools into parquet_support 2025-05-28 23:52:59 -05:00
Nathan TeBlunthuis
b8cdc82fc2 add ipython for dev 2025-05-28 23:52:37 -05:00
Nathan TeBlunthuis
2a2b611d79 Fix issue with .7z archives
Before, only fandom wikis dumps were compressed with .7z.
These archives can have several .xml files in the .7z; not just one.
So we need to have a flag for the fandom-2020 dumps.

This fixes the bug so .7z archives work in either case.
2025-05-28 21:49:11 -07:00
Nathan TeBlunthuis
39fec0820d use my version of mwxml since it fixes a bug. 2025-05-28 21:13:18 -07:00
Nathan TeBlunthuis
15e9234903 adding pyproject.toml 2025-05-28 20:59:55 -07:00