Go to file
Will Beason 3f94144b1b Begin adding test for parquet export
Changed logic for handling anonymous edits so that wikiq handles
the type for editor ids consistently. Parquet can mix int64 and
None, but not int64 and strings - previously the code used the empty
string to denote anonymous editors.

Tests failing. Don't merge yet.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-28 13:17:30 -05:00
test Begin adding test for parquet export 2025-05-28 13:17:30 -05:00
.gitignore Begin adding test for parquet export 2025-05-28 13:17:30 -05:00
.gitmodules migrate to mwpersistence. this fixes many issues. We preserve legacy persistence behavior using the --persistence-legacy. 2018-07-04 19:06:07 -07:00
README.rst Make tests runnable from anywhere 2025-05-27 13:40:57 -05:00
requirements.txt Remove unused dependencies and fix spacing 2025-05-26 14:15:01 -05:00
wikiq Begin adding test for parquet export 2025-05-28 13:17:30 -05:00

When you install this from git, you will need to first clone the repository::

  git clone git://projects.mako.cc/mediawiki_dump_tools

From within the repository working directory, initiatlize and set up the
submodule like::

  git submodule init
  git submodule update


Wikimedia dumps are usually in a compressed format such as 7z (most common), gz, or bz2. Wikiq uses your computer's compression software to read these files. Therefore wikiq depends on
`7za`, `gzcat`, and `zcat`. 

Dependencies
----------------
These non-Python dependencies must be installed on your system for wikiq and its
associated tests to work.

- 7zip
- ffmpeg

Tests
----
To run tests::

   python -m unittest test.Wikiq_Unit_Test

TODO:
_______________
1. [] Output metadata about the run. What parameters were used? What versions of deltas?
2. [] Url encoding by default