Go to file
2025-07-07 19:08:31 -07:00
test add missing files + add sorted_columns metadata. 2025-07-07 19:08:31 -07:00
.gitignore Allow specifying output file basename instead of just directory 2025-06-02 14:13:13 -05:00
.gitmodules got wikidiff2 persistence working except for paragraph moves. 2025-06-30 15:37:54 -07:00
.python-version Pin to python 3.9 2025-06-17 11:37:20 -05:00
diff_pyarrow_schema.py add missing files + add sorted_columns metadata. 2025-07-07 19:08:31 -07:00
php.ini compare pywikidiff2 to making requests to wikidiff2. 2025-07-07 10:51:11 -07:00
pyproject.toml compare pywikidiff2 to making requests to wikidiff2. 2025-07-07 10:51:11 -07:00
README.rst got wikidiff2 persistence working except for paragraph moves. 2025-06-30 15:37:54 -07:00
runtest.sh compare pywikidiff2 to making requests to wikidiff2. 2025-07-07 10:51:11 -07:00
tables.py add (optional) diff and text columns to output. 2025-07-07 14:39:52 -07:00
wiki_diff_matcher.py add (optional) diff and text columns to output. 2025-07-07 14:39:52 -07:00
wikiq add missing files + add sorted_columns metadata. 2025-07-07 19:08:31 -07:00

When you install this from git, you will need to first clone the repository::

  git clone git://projects.mako.cc/mediawiki_dump_tools

From within the repository working directory, initiatlize and set up the
submodule like::

  git submodule init
  git submodule update


Wikimedia dumps are usually in a compressed format such as 7z (most common), gz, or bz2. Wikiq uses your computer's compression software to read these files. Therefore wikiq depends on
`7za`, `gzcat`, and `zcat`. 

Dependencies
----------------
These non-Python dependencies must be installed on your system for wikiq and its
associated tests to work.

- 7zip
- ffmpeg

A new diff engine based on `_wikidiff2` can be used for word-persistence. Wikiq can also output the diffs between each page revision. This requires installing Wikidiff 2 on your system. On Debian or Ubuntu Linux this can be done via.

``apt-get install php-wikidiff2``

You may have to also run.
``sudo phpenmod wikidiff2``.

Tests
----
To run tests::

   python -m unittest test.Wikiq_Unit_Test

TODO:
_______________
1. [] Output metadata about the run. What parameters were used? What versions of deltas?
2. [] Url encoding by default

.. _wikidiff2: https://www.mediawiki.org/wiki/Wikidiff2