a883cb536b
add note to readme about dependency on compression software
2018-07-04 15:20:52 -07:00
e925ac9da1
add tests for wikipedia, malformed xml, bzip2, correct bz2 bug in wikiq.
2018-07-04 15:08:30 -07:00
d2746879d0
create baseline tests for xml dump processing
2018-07-03 23:43:47 -07:00
Benjamin Mako Hill
ba886ecf4c
a number of small updates and fixes
...
- fix regex for filename/filetype matches
- unload all files not just ones with end with xml in 7z archives
- fix bug that broke stdout
- minor cosmetic fixes
- updated mediawiki-utilities submodule to latest version
2018-05-17 14:37:20 -07:00
3f9da40747
support 7z archives with multiple files. add urlencode paraeter
2017-12-07 15:10:56 -08:00
Benjamin Mako Hill
5d7dceb9e4
fix code to work with bzip files
2017-02-06 18:25:17 -08:00
Benjamin Mako Hill
7d8ec932dd
added list of compressed dump files to .gitignore
2015-07-23 12:16:31 -07:00
Benjamin Mako Hill
d934700ee9
added support to parse namespaces from title
...
This is necessary for wikis (e.g., Wikia XML dumps) that do not include
namespace metadata as tags within each <page>.
2015-07-23 12:12:20 -07:00
Benjamin Mako Hill
108c8442b2
added README file to document the submodule
2015-07-22 19:55:08 -07:00
Benjamin Mako Hill
eeb0742cc6
created new repository for wikiq with Mediawiki-Utilities as a submodule
2015-07-22 19:44:52 -07:00