Benjamin Mako Hill
06d2fd1563
fix bugs with the date stamps
2020-04-01 10:47:33 -05:00
Benjamin Mako Hill
061105b7b4
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-04-01 07:53:40 -07:00
Benjamin Mako Hill
268f9e1cf3
added gitignore for wikipedia/data directory
2020-04-01 07:52:15 -07:00
Benjamin Mako Hill
784458f206
renamed the wikipedia_views module to wikipedia
2020-04-01 07:51:20 -07:00
Benjamin Mako Hill
6493361fbd
added initial version of revision-scraper
...
Borrows much of the structure from the (patched) version of the
dailyview scraper.
2020-04-01 07:42:38 -07:00
Benjamin Mako Hill
cb26ecabda
fixed typo in description of view scraper
2020-04-01 07:42:24 -07:00
Benjamin Mako Hill
5c861cfca4
renamed daily views to make it clear that it's just enwiki
2020-04-01 07:29:45 -07:00
Benjamin Mako Hill
38fdd07b39
changes to a bunch of the wikipedia view code
...
- Renamed the articles.txt to something more specific
Changes to both scripts:
- Updated filenames to match the new standard
- Reworked the logging code so that it can write to stderr by
default. Because we can only call logging.basicConfig() once, this
eneded up being a bigger changes.
- Caused scripts to output git commits and export to track which code
produced which dataset.
- Caused programs to take files instead of directories as
output (allows us to run programs more than once a day).
Changes to the wikipedia_views/scripts/fetch_daily_views.py:
- Change output that it outputs a sequence of JSON dictionaries (one
per line) as per the standard we agreed to and which is what
Twitter, Github, and other dumps do. Previous behavior was to create
output a single JSON list object.
- A number of other small changes and tweaks throughout.
2020-04-01 07:15:12 -07:00
8bb3db8b46
add examples using the translations data
2020-03-31 16:56:59 -07:00
c8b886364f
add documentation for the output files
2020-03-31 16:22:30 -07:00
29ae62c83e
create 'latest.csv' to link to the most recent output.
2020-03-31 16:16:36 -07:00
687da1284f
Merge branch 'master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-03-31 16:01:43 -07:00
603a7b6ec3
update output
2020-03-31 16:01:38 -07:00
74667cf4dc
use 'item' instead of 'entity'
2020-03-31 15:34:34 -07:00
3d142377ca
rename compile script
2020-03-31 15:27:39 -07:00
55110c7f21
update compile script
2020-03-31 15:27:21 -07:00
4fd516a700
Improve README.md for keywords
2020-03-31 15:25:51 -07:00
98b07b8098
rename 'transliterations' to 'keywords'
2020-03-31 15:15:01 -07:00
20ad09d155
Update README.md
...
linking to project pages more fully
2020-03-31 17:09:58 -05:00
10a7d915a5
Merge pull request #10 from makoshark/master
...
stop writing writing header to one-column list
2020-03-31 12:23:36 -07:00
Benjamin Mako Hill
72bf7bcd37
stop writing writing header to one-column list
...
This feels like it's asking for trouble. Description of the contents
of the list is in the filename.
2020-03-31 08:35:23 -07:00
09d171608f
reorganize file structure
...
- move 'input' files to resources
- outputs not meant for downstream go in output/intermediate
- csv outputs for downstream go in output/csv
2020-03-29 21:49:57 -07:00
Kaylea Champion
50f58a3887
migrating to new directory structure
2020-03-29 13:42:01 -05:00
a86c3a97ee
Merge pull request #7 from kayleachampion/master
...
cleanup with merge
2020-03-29 11:39:32 -07:00
Kaylea Champion
317c32cdb5
all march data
2020-03-29 00:19:54 -07:00
Kaylea Champion
3bd1c684df
adding a logs dir without adding my log files, assuming those don't
...
belong in repo
2020-03-28 23:50:04 -07:00
Kaylea Champion
fa8e977741
new version of this from scrape. no double quotes around articles any
...
more
2020-03-28 23:47:55 -07:00
Kaylea Champion
4226b45b97
adds a scraper to update the articles file
2020-03-28 23:46:48 -07:00
Kaylea Champion
c7af46f8fb
adds in new logging capability
2020-03-28 18:46:35 -07:00
05b8025e15
Merge pull request #9 from aaronshaw/master
...
minimal analysis example with pageview data
2020-03-28 20:42:40 -05:00
aaronshaw
5dfbe3dab4
minimal analysis example with pageview data
2020-03-28 20:33:23 -05:00
c0e50fe297
Merge pull request #8 from aaronshaw/master
...
Update to load data from github url and include 3/28 data in output
2020-03-28 17:38:20 -05:00
aaronshaw
1f5b15f099
regenerated following update to R src that creates this file
2020-03-28 17:31:36 -05:00
aaronshaw
9e0c92242e
Loading data directly from github URL. Commenting out commands that assume cloned repository.
2020-03-28 17:30:37 -05:00
Kaylea Champion
7b3062ffb1
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory
2020-03-28 14:46:00 -07:00
033149776c
Merge pull request #5 from kayleachampion/master
...
view data
2020-03-28 14:17:21 -07:00
dd7d968bb6
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master
...
Some suggested changes.
2020-03-28 14:15:53 -07:00
c690df4852
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
2020-03-28 14:13:46 -07:00
f5ac92330c
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
2020-03-28 14:13:26 -07:00
1b2bb7d1df
Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master
2020-03-28 14:12:36 -07:00
ee91df4c04
Read the whole input file before making api calls
2020-03-28 14:12:17 -07:00
24e5590836
Read the whole input file before making api calls
2020-03-28 14:09:28 -07:00
groceryheist
0fb8ac2ed9
Merge pull request #4 from CommunityDataScienceCollective/translations
...
Transliterations: Use data from google trends and wikidata to find transliterations.
2020-03-28 14:07:04 -07:00
2b56ed26f4
Update transliteration results for 2020-03-28
...
- renamed results from yesterday into time stamped file
2020-03-28 14:03:16 -07:00
207b1f8b95
Read entire input files before making api calls.
...
This is nicer style to not hold onto resources for as long.
It will use a bit more memory.
2020-03-28 13:55:52 -07:00
282208507a
Keep better track of time.
...
- Add timestamp ot transliterations output file.
- Append wikidata search terms instead of overwriting
2020-03-28 13:52:54 -07:00
Kaylea Champion
ed0641ecc7
Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory
...
updates my branch with all the master changes so far
2020-03-28 12:21:37 -07:00
Kaylea Champion
cd08294288
trialing new approach
2020-03-28 12:18:01 -07:00
Kaylea Champion
c677d8d70a
trialing new approach
2020-03-28 12:17:45 -07:00
e720653a23
typo fix
2020-03-28 10:01:43 -07:00