Nathan TeBlunthuis
5a3e4102b5
got wikidiff2 persistence working except for paragraph moves.
2025-06-30 15:37:54 -07:00
Will Beason
4bbed4a196
Merge branch 'parquet_support' into test-parquet
2025-06-17 12:20:19 -05:00
Will Beason
390499dd90
Pin to python 3.9
...
Since our execution environment requires this
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-06-17 11:37:20 -05:00
Nathan TeBlunthuis
f39ceefa4a
Merge branch 'parquet_support' of gitea:collective/mediawiki_dump_tools into parquet_support
2025-05-29 18:05:28 -07:00
Nathan TeBlunthuis
bd22d26291
update deps and add edit_summary to wikiq output.
2025-05-29 18:02:14 -07:00
Nathan TeBlunthuis
22d14dc5f2
Remove dependency on pytest.
2025-05-28 21:54:31 -07:00
Nathan TeBlunthuis
5a10f59dc4
Merge branch 'parquet_support' of gitea:collective/mediawiki_dump_tools into parquet_support
2025-05-28 23:52:59 -05:00
Nathan TeBlunthuis
b8cdc82fc2
add ipython for dev
2025-05-28 23:52:37 -05:00
Nathan TeBlunthuis
2a2b611d79
Fix issue with .7z archives
...
Before, only fandom wikis dumps were compressed with .7z.
These archives can have several .xml files in the .7z; not just one.
So we need to have a flag for the fandom-2020 dumps.
This fixes the bug so .7z archives work in either case.
2025-05-28 21:49:11 -07:00
Nathan TeBlunthuis
39fec0820d
use my version of mwxml since it fixes a bug.
2025-05-28 21:13:18 -07:00
Nathan TeBlunthuis
15e9234903
adding pyproject.toml
2025-05-28 20:59:55 -07:00