Commit Graph

3 Commits

Author SHA1 Message Date
Benjamin Mako Hill
38fdd07b39 changes to a bunch of the wikipedia view code
- Renamed the articles.txt to something more specific

Changes to both scripts:

- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
  default. Because we can only call logging.basicConfig() once, this
  eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
  produced which dataset.
- Caused programs to take files instead of directories as
  output (allows us to run programs more than once a day).

Changes to the wikipedia_views/scripts/fetch_daily_views.py:

- Change output that it outputs a sequence of JSON dictionaries (one
  per line) as per the standard we agreed to and which is what
  Twitter, Github, and other dumps do. Previous behavior was to create
  output a single JSON list object.
- A number of other small changes and tweaks throughout.
2020-04-01 07:15:12 -07:00
Benjamin Mako Hill
72bf7bcd37 stop writing writing header to one-column list
This feels like it's asking for trouble. Description of the contents
of the list is in the filename.
2020-03-31 08:35:23 -07:00
Kaylea Champion
4226b45b97 adds a scraper to update the articles file 2020-03-28 23:46:48 -07:00