69 lines
3.1 KiB
Plaintext
69 lines
3.1 KiB
Plaintext
============================
|
|
Wikia user roles scraper
|
|
============================
|
|
This package provides a pair of python scripts that obtain data on the roles of mediawiki users from the Wikia API. It is maintained by Nate TeBlunthuis: nathante@uw.edu.
|
|
|
|
Usage
|
|
=======
|
|
The scripts read a list of wikis that have urls and names. See example/wikiList.csv for an example wiki list. In this example the list is comma-separated and has a header. The listusers api provides data on current bots or adminsistrators. The logevents api provides historical data. Both are needed to identify bots or administrators for the entire history of a Wiki. The data can be parsed using the RCommunityData package found at code.communitydata.cc.
|
|
|
|
The shell scripts scrape_log.sh and scrape_list.sh provide examples of how to use the python programs.
|
|
|
|
The scripts are able to detect and log errors caused by deleted wikis and other cases where the API data is unvailable.
|
|
|
|
userroles_from_listusers.py
|
|
--------------------------------
|
|
|
|
usage: userroles_from_listusers.py [-h] [--no-header] [--nuke-old] [--sep SEP]
|
|
[-i I]
|
|
wikilist output
|
|
|
|
Get user roles for Wikis from the Mediawiki list users API
|
|
|
|
positional arguments:
|
|
wikilist path to the input file: a wiki list with wiki url ilename
|
|
output path to put the logs we scrape e.g.
|
|
/com/projects/messagewalls/allusers/
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
--no-header does the wikilist have no header?
|
|
--nuke-old remove old files
|
|
--sep SEP input table delimiter
|
|
-i I <j,k> two 0-based indices for wiki and url in the csv,
|
|
default=0,1
|
|
|
|
userroles_from_logevents.py
|
|
---------------------------------
|
|
usage: userroles_from_logevents.py [-h] [--no-header] [--nuke-old] [--sep SEP]
|
|
[-i I] [--blocks-output BLOCKS_OUTPUT]
|
|
wikilist output
|
|
|
|
Get user roles for Wikis from the Mediawiki list users API
|
|
|
|
positional arguments:
|
|
wikilist path to the input file: a wiki list with wiki url
|
|
ilename
|
|
output path to put the logs we scrape e.g.
|
|
/com/projects/messagewalls/allusers/
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
--no-header does the wikilist have no header?
|
|
--nuke-old remove old files.
|
|
--sep SEP input table delimiter
|
|
-i I <j,k> two 0-based indices for wiki and url in the csv,
|
|
default=0,1
|
|
--blocks-output BLOCKS_OUTPUT
|
|
Path to output block event logs. If empty, blocks are
|
|
ignored.
|
|
|
|
License
|
|
=========
|
|
Copyright (C) 2018 Nathan TeBlunthuis.
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the GNU Free Documentation License, Version 1.3
|
|
or any later version published by the Free Software Foundation;
|
|
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
|
|
A copy of the license is included in the file entitled "fdl-1.3.md".
|