Benjamin Mako Hill
4fe5deb013
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-04-01 16:42:16 -05:00
Benjamin Mako Hill
d655e1ce93
tweaks to revision export code
...
- flags were not being exported (e.g., minor, anon)
- broke with hidden/deleted user names
2020-04-01 16:39:53 -05:00
Benjamin Mako Hill
3f19805d36
fix bug in rev scraper script
...
Bug was a break, added for debugging, that caused the script to only
work for the first article.
2020-04-01 15:49:28 -05:00
Benjamin Mako Hill
95d37cff7a
change copy to move in cron scripts
2020-04-01 15:49:02 -05:00
Benjamin Mako Hill
5739d1c404
Merge branch 'master' of github.com:makoshark/COVID-19_Digital_Observatory
2020-04-01 15:18:50 -05:00
Benjamin Mako Hill
141871eda6
add two small shellscripts for automation
...
- Added two bash scripts usable as cronjobs to automate the production
of revisions and view data.
These commands automate the process of running code and copying material
2020-04-01 15:15:11 -05:00
Benjamin Mako Hill
04e00f363b
address confusion with date
...
The timestamps in files should be the day that the exports are done. For
the view data, the query date needs to be the day before but this
shouldn't be the timestamp we use in files, etc.
2020-04-01 15:14:05 -05:00
Benjamin Mako Hill
06d2fd1563
fix bugs with the date stamps
2020-04-01 10:47:33 -05:00
4f8a698c62
Merge pull request #11 from jdfoote/master
...
Adding a tidyverse example (with very verbose comments)
2020-04-01 10:41:02 -05:00
Benjamin Mako Hill
4e1b7fbdfe
fixed typo in debug message
2020-04-01 08:18:05 -07:00
Benjamin Mako Hill
061105b7b4
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-04-01 07:53:40 -07:00
Benjamin Mako Hill
268f9e1cf3
added gitignore for wikipedia/data directory
2020-04-01 07:52:15 -07:00
Benjamin Mako Hill
784458f206
renamed the wikipedia_views module to wikipedia
2020-04-01 07:51:20 -07:00
Benjamin Mako Hill
6493361fbd
added initial version of revision-scraper
...
Borrows much of the structure from the (patched) version of the
dailyview scraper.
2020-04-01 07:42:38 -07:00
Benjamin Mako Hill
cb26ecabda
fixed typo in description of view scraper
2020-04-01 07:42:24 -07:00
Benjamin Mako Hill
5c861cfca4
renamed daily views to make it clear that it's just enwiki
2020-04-01 07:29:45 -07:00
Benjamin Mako Hill
38fdd07b39
changes to a bunch of the wikipedia view code
...
- Renamed the articles.txt to something more specific
Changes to both scripts:
- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
default. Because we can only call logging.basicConfig() once, this
eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
produced which dataset.
- Caused programs to take files instead of directories as
output (allows us to run programs more than once a day).
Changes to the wikipedia_views/scripts/fetch_daily_views.py:
- Change output that it outputs a sequence of JSON dictionaries (one
per line) as per the standard we agreed to and which is what
Twitter, Github, and other dumps do. Previous behavior was to create
output a single JSON list object.
- A number of other small changes and tweaks throughout.
2020-04-01 07:15:12 -07:00
Jeremy Foote
6b05896aa5
Adding a tidyverse example (with very verbose comments)
2020-03-31 22:42:31 -04:00
8bb3db8b46
add examples using the translations data
2020-03-31 16:56:59 -07:00
c8b886364f
add documentation for the output files
2020-03-31 16:22:30 -07:00
29ae62c83e
create 'latest.csv' to link to the most recent output.
2020-03-31 16:16:36 -07:00
687da1284f
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-03-31 16:01:43 -07:00
603a7b6ec3
update output
2020-03-31 16:01:38 -07:00
74667cf4dc
use 'item' instead of 'entity'
2020-03-31 15:34:34 -07:00
3d142377ca
rename compile script
2020-03-31 15:27:39 -07:00
55110c7f21
update compile script
2020-03-31 15:27:21 -07:00
4fd516a700
Improve README.md for keywords
2020-03-31 15:25:51 -07:00
98b07b8098
rename 'transliterations' to 'keywords'
2020-03-31 15:15:01 -07:00
20ad09d155
Update README.md
...
linking to project pages more fully
2020-03-31 17:09:58 -05:00
10a7d915a5
Merge pull request #10 from makoshark/master
...
stop writing writing header to one-column list
2020-03-31 12:23:36 -07:00
Benjamin Mako Hill
72bf7bcd37
stop writing writing header to one-column list
...
This feels like it's asking for trouble. Description of the contents
of the list is in the filename.
2020-03-31 08:35:23 -07:00
09d171608f
reorganize file structure
...
- move 'input' files to resources
- outputs not meant for downstream go in output/intermediate
- csv outputs for downstream go in output/csv
2020-03-29 21:49:57 -07:00
Kaylea Champion
50f58a3887
migrating to new directory structure
2020-03-29 13:42:01 -05:00
a86c3a97ee
Merge pull request #7 from kayleachampion/master
...
cleanup with merge
2020-03-29 11:39:32 -07:00
Kaylea Champion
317c32cdb5
all march data
2020-03-29 00:19:54 -07:00
Kaylea Champion
3bd1c684df
adding a logs dir without adding my log files, assuming those don't
...
belong in repo
2020-03-28 23:50:04 -07:00
Kaylea Champion
fa8e977741
new version of this from scrape. no double quotes around articles any
...
more
2020-03-28 23:47:55 -07:00
Kaylea Champion
4226b45b97
adds a scraper to update the articles file
2020-03-28 23:46:48 -07:00
Kaylea Champion
c7af46f8fb
adds in new logging capability
2020-03-28 18:46:35 -07:00
05b8025e15
Merge pull request #9 from aaronshaw/master
...
minimal analysis example with pageview data
2020-03-28 20:42:40 -05:00
aaronshaw
5dfbe3dab4
minimal analysis example with pageview data
2020-03-28 20:33:23 -05:00
c0e50fe297
Merge pull request #8 from aaronshaw/master
...
Update to load data from github url and include 3/28 data in output
2020-03-28 17:38:20 -05:00
aaronshaw
1f5b15f099
regenerated following update to R src that creates this file
2020-03-28 17:31:36 -05:00
aaronshaw
9e0c92242e
Loading data directly from github URL. Commenting out commands that assume cloned repository.
2020-03-28 17:30:37 -05:00
Kaylea Champion
7b3062ffb1
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-03-28 14:46:00 -07:00
033149776c
Merge pull request #5 from kayleachampion/master
...
view data
2020-03-28 14:17:21 -07:00
dd7d968bb6
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master
...
Some suggested changes.
2020-03-28 14:15:53 -07:00
c690df4852
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
2020-03-28 14:13:46 -07:00
f5ac92330c
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
2020-03-28 14:13:26 -07:00
1b2bb7d1df
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
2020-03-28 14:12:36 -07:00