d77b0a4965
migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy.
2018-07-04 19:06:07 -07:00
7db6288923
migrate reverts to python-mwreverts
2018-07-04 15:29:48 -07:00
e925ac9da1
add tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq.
2018-07-04 15:08:30 -07:00
Benjamin Mako Hill
ba886ecf4c
a number of small updates and fixes
...
- fix regex for filename/filetype matches
- unload all files not just ones with end with xml in 7z archives
- fix bug that broke stdout
- minor cosmetic fixes
- updated mediawiki-utilities submodule to latest version
2018-05-17 14:37:20 -07:00
3f9da40747
support 7z archives with multiple files. add urlencode paraeter
2017-12-07 15:10:56 -08:00
Benjamin Mako Hill
5d7dceb9e4
fix code to work with bzip files
2017-02-06 18:25:17 -08:00
Benjamin Mako Hill
d934700ee9
added support to parse namespaces from title
...
This is necessary for wikis (e.g., Wikia XML dumps) that do not include
namespace metadata as tags within each <page>.
2015-07-23 12:12:20 -07:00
Benjamin Mako Hill
eeb0742cc6
created new repository for wikiq with Mediawiki-Utilities as a submodule
2015-07-22 19:44:52 -07:00