Commit Graph

52 Commits

Author SHA1 Message Date
Kaylea Champion
317c32cdb5 all march data 2020-03-29 00:19:54 -07:00
Kaylea Champion
3bd1c684df adding a logs dir without adding my log files, assuming those don't
belong in repo
2020-03-28 23:50:04 -07:00
Kaylea Champion
fa8e977741 new version of this from scrape. no double quotes around articles any
more
2020-03-28 23:47:55 -07:00
Kaylea Champion
4226b45b97 adds a scraper to update the articles file 2020-03-28 23:46:48 -07:00
Kaylea Champion
c7af46f8fb adds in new logging capability 2020-03-28 18:46:35 -07:00
Kaylea Champion
7b3062ffb1 Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory 2020-03-28 14:46:00 -07:00
033149776c
Merge pull request #5 from kayleachampion/master
view data
2020-03-28 14:17:21 -07:00
dd7d968bb6
Merge pull request #1 from CommunityDataScienceCollective/kaylea/master
Some suggested changes.
2020-03-28 14:15:53 -07:00
c690df4852 Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master 2020-03-28 14:13:46 -07:00
f5ac92330c Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master 2020-03-28 14:13:26 -07:00
1b2bb7d1df Merge branch 'kaylea/master' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into kaylea/master 2020-03-28 14:12:36 -07:00
ee91df4c04 Read the whole input file before making api calls 2020-03-28 14:12:17 -07:00
24e5590836 Read the whole input file before making api calls 2020-03-28 14:09:28 -07:00
groceryheist
0fb8ac2ed9
Merge pull request #4 from CommunityDataScienceCollective/translations
Transliterations: Use data from google trends and wikidata to find transliterations.
2020-03-28 14:07:04 -07:00
2b56ed26f4 Update transliteration results for 2020-03-28
- renamed results from yesterday into time stamped file
2020-03-28 14:03:16 -07:00
207b1f8b95 Read entire input files before making api calls.
This is nicer style to not hold onto resources for as long.
It will use a bit more memory.
2020-03-28 13:55:52 -07:00
282208507a Keep better track of time.
- Add timestamp ot transliterations output file.
- Append wikidata search terms instead of overwriting
2020-03-28 13:52:54 -07:00
Kaylea Champion
ed0641ecc7 Merge branch 'master' of https://github.com/CommunityDataScienceCollective/COVID-19_Digital_Observatory
updates my branch with all the master changes so far
2020-03-28 12:21:37 -07:00
Kaylea Champion
cd08294288 trialing new approach 2020-03-28 12:18:01 -07:00
Kaylea Champion
c677d8d70a trialing new approach 2020-03-28 12:17:45 -07:00
e720653a23 typo fix 2020-03-28 10:01:43 -07:00
a9f129f1d6 Merge branch 'translations' of github.com:CommunityDataScienceCollective/COVID-19_Digital_Observatory into translations 2020-03-28 09:58:43 -07:00
18118328cc
Merge pull request #6 from aaronshaw/translations
minimal example in R
2020-03-28 10:28:41 -05:00
aaronshaw
c025a526e8 a minimal example in R that outputs a table of top 5 related search terms per day per query 2020-03-28 10:18:33 -05:00
49c3203d78 A few suggestions for the python script:
- using format strings (f-strings) is a nice way in python to build
strings using variables.
- you can read and process a file in one pass if you iterate over the
open file itself instead of reading it into a variable and then
looping
- i had to change your strip code when i stopped using csv reader
- my python linter and auto-formater hate non-indendent comments
- i added a few lines to print cases where we don't get Ok responses.
2020-03-27 20:43:29 -07:00
c54d8ba28a Reorganize wikipedia views subproject into subpackage. 2020-03-27 20:13:11 -07:00
6e7afee8b3 add mwapi to requirements 2020-03-27 20:05:07 -07:00
Kaylea Champion
5ffb2cacd6 all data 2020-03-27 18:24:19 -07:00
Kaylea Champion
7d7fe9aaf6 cleaning out commented code 2020-03-27 18:19:22 -07:00
Kaylea Champion
d845c30455 reorganizes comments 2020-03-27 18:17:39 -07:00
Kaylea Champion
7ab95ae5f6 initial files 2020-03-27 18:10:13 -07:00
Kaylea Champion
e71b896cec makes TSV
makes JSON
2020-03-27 18:08:43 -07:00
Kaylea Champion
0cc1ffd0b6 many bug fixes 2020-03-27 17:24:18 -07:00
070a1623aa add output files from tranliteration search using google trends 2020-03-27 16:53:03 -07:00
f548eeedd5 expand wikidata search to get keywords from google trends 2020-03-27 16:52:19 -07:00
Kaylea Champion
ec3f66bbcc for testing 2020-03-27 16:00:36 -07:00
Benjamin Mako Hill
3584dccef3
Merge pull request #3 from kayleachampion/master
adding in an article list
2020-03-27 14:47:15 -07:00
Kaylea Champion
9f0dbb00f7 new file -- list of article names 2020-03-27 14:41:38 -07:00
8668a764ad start keeping track of installation requirements 2020-03-27 10:55:24 -07:00
96286a07d8 update output using limited base terms list 2020-03-26 11:14:55 -07:00
6905a52044 shell script to run the whoe process 2020-03-26 11:13:23 -07:00
36d9a1aa8a narrow base terms 2020-03-26 10:24:31 -07:00
groceryheist
f35216cfce
Merge pull request #2 from aaronshaw/patch-1
Update base_terms.txt
2020-03-26 10:23:27 -07:00
d75ef80b4a
Update base_terms.txt
typo fix
2020-03-25 20:28:20 -05:00
36167295ec Finish MVP for transliterations
code is reasonably well-written
checked that we get seemingly good data back
adding README
adding data
2020-03-24 22:06:45 -07:00
308d462e76 Untested code to get labels from wikidata in all languages. 2020-03-24 18:04:22 -07:00
836098461e Python code to find wikidata entities to translate. Here we search the api for entities that have covid keywords.
Building system for finding translations from Wikidata.
2020-03-24 15:03:47 -07:00
groceryheist
3441ae7adf
Merge pull request #1 from kayleachampion/patch-1
Update README.md
2020-03-24 14:31:36 -07:00
b648ae82c2
Update README.md
some language nicing and adding in immediacy goal
2020-03-24 14:28:17 -07:00
becbaa1676
Update README.md
language nicing
2020-03-24 14:25:43 -07:00