Initialize the repository for the wikia user roles scraper project.
This commit is contained in:
68
README.txt
Normal file
68
README.txt
Normal file
@@ -0,0 +1,68 @@
|
||||
============================
|
||||
Wikia user roles scraper
|
||||
============================
|
||||
This package provides a pair of python scripts that obtain data on the roles of mediawiki users from the Wikia API. It is maintained by Nate TeBlunthuis: nathante@uw.edu.
|
||||
|
||||
Usage
|
||||
=======
|
||||
The scripts read a list of wikis that have urls and names. See example/wikiList.csv for an example wiki list. In this example the list is comma-separated and has a header. The listusers api provides data on current bots or adminsistrators. The logevents api provides historical data. Both are needed to identify bots or administrators for the entire history of a Wiki. The data can be parsed using the RCommunityData package found at code.communitydata.cc.
|
||||
|
||||
The shell scripts scrape_log.sh and scrape_list.sh provide examples of how to use the python programs.
|
||||
|
||||
The scripts are able to detect and log errors caused by deleted wikis and other cases where the API data is unvailable.
|
||||
|
||||
userroles_from_listusers.py
|
||||
--------------------------------
|
||||
|
||||
usage: userroles_from_listusers.py [-h] [--no-header] [--nuke-old] [--sep SEP]
|
||||
[-i I]
|
||||
wikilist output
|
||||
|
||||
Get user roles for Wikis from the Mediawiki list users API
|
||||
|
||||
positional arguments:
|
||||
wikilist path to the input file: a wiki list with wiki url ilename
|
||||
output path to put the logs we scrape e.g.
|
||||
/com/projects/messagewalls/allusers/
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--no-header does the wikilist have no header?
|
||||
--nuke-old remove old files
|
||||
--sep SEP input table delimiter
|
||||
-i I <j,k> two 0-based indices for wiki and url in the csv,
|
||||
default=0,1
|
||||
|
||||
userroles_from_logevents.py
|
||||
---------------------------------
|
||||
usage: userroles_from_logevents.py [-h] [--no-header] [--nuke-old] [--sep SEP]
|
||||
[-i I] [--blocks-output BLOCKS_OUTPUT]
|
||||
wikilist output
|
||||
|
||||
Get user roles for Wikis from the Mediawiki list users API
|
||||
|
||||
positional arguments:
|
||||
wikilist path to the input file: a wiki list with wiki url
|
||||
ilename
|
||||
output path to put the logs we scrape e.g.
|
||||
/com/projects/messagewalls/allusers/
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--no-header does the wikilist have no header?
|
||||
--nuke-old remove old files.
|
||||
--sep SEP input table delimiter
|
||||
-i I <j,k> two 0-based indices for wiki and url in the csv,
|
||||
default=0,1
|
||||
--blocks-output BLOCKS_OUTPUT
|
||||
Path to output block event logs. If empty, blocks are
|
||||
ignored.
|
||||
|
||||
License
|
||||
=========
|
||||
Copyright (C) 2018 Nathan TeBlunthuis.
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.3
|
||||
or any later version published by the Free Software Foundation;
|
||||
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
|
||||
A copy of the license is included in the file entitled "fdl-1.3.md".
|
||||
Reference in New Issue
Block a user