Go to file
2023-05-03 10:23:30 -07:00
test fix because pandas testing API has changed 2023-04-29 11:52:13 -07:00
.gitignore added list of compressed dump files to .gitignore 2015-07-23 12:16:31 -07:00
.gitmodules migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy. 2018-07-04 19:06:07 -07:00
code_review_notes.txt code review. 2023-05-03 10:23:30 -07:00
README.rst code review. 2023-05-03 10:23:30 -07:00
wikiq code review. 2023-05-03 10:23:30 -07:00

When you install this from git, you will need to first clone the repository::

  git clone git://projects.mako.cc/mediawiki_dump_tools

From within the repository working directory, initiatlize and set up the
submodule like::

  git submodule init
  git submodule update


Wikimedia dumps are usually in a compressed format such as 7z (most common),
gz, or bz2. Wikiq uses your computer's compression software to read these
files. Therefore wikiq depends on `7za`, `gzcat`, and `zcat`. 

There are also a series of Python dependencies. You can install these using pip
with a command like:

  pip3 install mwbase mwreverts mwxml mwtypes mwcli mwdiffs mwpersistence pandas