1
0

============================
Wikia user roles scraper
============================
This package provides a pair of python scripts that obtain data on the roles of mediawiki users from the Wikia API. It is maintained by Nate TeBlunthuis: nathante@uw.edu.

Usage
=======
The scripts read a list of wikis that have urls and names. See example/wikiList.csv for an example wiki list. In this example the list is comma-separated and has a header. The listusers api provides data on current bots or adminsistrators. The logevents api provides historical data. Both are needed to identify bots or administrators for the entire history of a Wiki. The data can be parsed using the RCommunityData package found at code.communitydata.cc.

The shell scripts scrape_log.sh and scrape_list.sh provide examples of how to use the python programs. 

The scripts are able to detect and log errors caused by deleted wikis and other cases where the API data is unvailable. 

userroles_from_listusers.py
--------------------------------

usage: userroles_from_listusers.py [-h] [--no-header] [--nuke-old] [--sep SEP]
                                   [-i I]
                                   wikilist output

Get user roles for Wikis from the Mediawiki list users API

positional arguments:
  wikilist     path to the input file: a wiki list with wiki url ilename
  output       path to put the logs we scrape e.g.
               /com/projects/messagewalls/allusers/

optional arguments:
  -h, --help   show this help message and exit
  --no-header  does the wikilist have no header?
  --nuke-old   remove old files
  --sep SEP    input table delimiter
  -i I         <j,k> two 0-based indices for wiki and url in the csv,
               default=0,1

userroles_from_logevents.py
---------------------------------
usage: userroles_from_logevents.py [-h] [--no-header] [--nuke-old] [--sep SEP]
                                   [-i I] [--blocks-output BLOCKS_OUTPUT]
                                   wikilist output

Get user roles for Wikis from the Mediawiki list users API

positional arguments:
  wikilist              path to the input file: a wiki list with wiki url
                        ilename
  output                path to put the logs we scrape e.g.
                        /com/projects/messagewalls/allusers/

optional arguments:
  -h, --help            show this help message and exit
  --no-header           does the wikilist have no header?
  --nuke-old            remove old files.
  --sep SEP             input table delimiter
  -i I                  <j,k> two 0-based indices for wiki and url in the csv,
                        default=0,1
  --blocks-output BLOCKS_OUTPUT
                        Path to output block event logs. If empty, blocks are
                        ignored.

License
=========
Copyright (C)  2018  Nathan TeBlunthuis.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the file entitled "fdl-1.3.md".
Description
No description provided
Readme 73 KiB
Languages
Python 90.2%
Shell 9.8%