Nathan TeBlunthuis
c40506137b
make wikiq memory efficient again via batch processing.
2025-07-15 15:20:17 -07:00
Nathan TeBlunthuis
76d54ae597
support partitioning output parquet by namespace.
2025-07-07 20:58:43 -07:00
Nathan TeBlunthuis
56c90fe1cc
add missing files + add sorted_columns metadata.
2025-07-07 19:08:31 -07:00
Nathan TeBlunthuis
a8e9e7f4fd
wikidiff2 integration: pwr complete.
...
test for pwr based on wikidiff2.
2025-07-07 12:18:22 -07:00
Will Beason
0d9ab003f0
Fix tests for new field
...
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-06-17 12:44:07 -05:00
Will Beason
032fec3198
Remove unnecessary urlencode tests
...
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-30 13:20:10 -05:00
Will Beason
aec6e5fafa
Refactor collapse user logic
...
Use simple loop for when we aren't collapsing users.
Add test which covers case when users are deleted.
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-29 15:20:34 -05:00
Will Beason
c0e629a313
Add ability to disable revert detection
...
Also add test to ensure functionality works.
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-29 11:59:10 -05:00
Will Beason
52757a8239
Add noargs test for ikwiki
...
This way we can ensure that the parquet code outputs equivalent output.
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-28 15:04:10 -05:00
b1bea09ad6
fix bugs and unit tests
2021-10-18 13:33:05 -07:00
414cc5ff2d
validate tests and add asserts and baselines for regex tests.
2019-11-09 12:19:55 -08:00
c84844cfb5
add unit tests for configuring revert_radius
2019-10-07 15:02:30 -07:00
324ccc8e26
update baseline outputs
2019-10-05 16:36:07 -07:00
7cd0bf3b9e
Add parameter for selecting specific namespaces.
2018-08-23 18:49:32 -07:00
f468d1a5b6
add support for persistence with segment matching
2018-08-20 16:08:16 -07:00
bf396ad366
Prefix page titles with namespace names.
2018-07-09 22:11:17 -07:00
dba793c6ac
migrate to mwxml. This completes the migration away from python-mediawiki-utilities. Except for preserving legacy persistence behavior, we can safely use the nice updates from the mediawiki-utils project.
2018-07-05 01:16:00 -07:00
d77b0a4965
migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy.
2018-07-04 19:06:07 -07:00
e925ac9da1
add tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq.
2018-07-04 15:08:30 -07:00
d2746879d0
create baseline tests for xml dump processing
2018-07-03 23:43:47 -07:00