Go to file
2018-08-31 16:03:07 -07:00
bin Use dask to parallelize and scale user level datasets 2018-08-14 14:44:21 -07:00
tests Use dask to parallelize and scale user level datasets 2018-08-14 14:44:21 -07:00
wikiq_users Merge branch 'user_level_wikiq' of code.communitydata.cc:mediawiki_dump_tools into user_level_wikiq 2018-08-31 16:02:48 -07:00
.gitignore added list of compressed dump files to .gitignore 2015-07-23 12:16:31 -07:00
.gitmodules migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy. 2018-07-04 19:06:07 -07:00
install.dask.sh Use dask to parallelize and scale user level datasets 2018-08-14 14:44:21 -07:00
README.rst add note to readme about dependency on compression software 2018-07-04 15:20:52 -07:00
wikiq_util.py Use dask to parallelize and scale user level datasets 2018-08-14 14:44:21 -07:00

When you install this from git, you will need to first clone the repository::

  git clone git://projects.mako.cc/mediawiki_dump_tools

From within the repository working directory, initiatlize and set up the
submodule like::

  git submodule init
  git submodule update


Wikimedia dumps are usually in a compressed format such as 7z (most common), gz, or bz2. Wikiq uses your computer's compression software to read these files. Therefore wikiq depends on
`7za`, `gzcat`, and `zcat`.