Benjamin Mako Hill 2ff4d60613 added counting functionality to regex code
The regex code has historically returned the actual matched patterns and the
named capture groups within regexes.  When trying to count common and/or large
patterns, this leads to very large outputs.

I've added two new functions -RPc and -CPc that will cause wikiq to return
counts of each pattern (0 when there are no matches). The options apply to all
comment or revision patterns. I considered interfaces to make it possible to do
some but others but concluded this would be too complicated an interface.

This code should be checked before it's merged.
2023-04-29 11:40:03 -07:00
2023-04-28 14:40:18 -07:00

When you install this from git, you will need to first clone the repository::

  git clone git://projects.mako.cc/mediawiki_dump_tools

From within the repository working directory, initiatlize and set up the
submodule like::

  git submodule init
  git submodule update


Wikimedia dumps are usually in a compressed format such as 7z (most common),
gz, or bz2. Wikiq uses your computer's compression software to read these
files. Therefore wikiq depends on `7za`, `gzcat`, and `zcat`. 

There are also a series of Python dependencies. You can install these using pip
with a command like:

  pip3 install mwbase mwreverts mwxml mwtypes mwcli mwdiffs mwpersistence
Description
No description provided
Readme 73 MiB
Languages
Python 100%