2ff4d6061399c22eb539f1fd609e7046aa44dba9
The regex code has historically returned the actual matched patterns and the named capture groups within regexes. When trying to count common and/or large patterns, this leads to very large outputs. I've added two new functions -RPc and -CPc that will cause wikiq to return counts of each pattern (0 when there are no matches). The options apply to all comment or revision patterns. I considered interfaces to make it possible to do some but others but concluded this would be too complicated an interface. This code should be checked before it's merged.
When you install this from git, you will need to first clone the repository:: git clone git://projects.mako.cc/mediawiki_dump_tools From within the repository working directory, initiatlize and set up the submodule like:: git submodule init git submodule update Wikimedia dumps are usually in a compressed format such as 7z (most common), gz, or bz2. Wikiq uses your computer's compression software to read these files. Therefore wikiq depends on `7za`, `gzcat`, and `zcat`. There are also a series of Python dependencies. You can install these using pip with a command like: pip3 install mwbase mwreverts mwxml mwtypes mwcli mwdiffs mwpersistence
Description
Languages
Python
100%