13
0
Commit Graph

9 Commits

Author SHA1 Message Date
Nate E TeBlunthuis
6dca79a41f Rename spark script to reflect that it is for comments. 2020-07-03 14:00:36 -07:00
Nate E TeBlunthuis
94c7a74bd9 update .gitignore 2020-07-03 13:55:25 -07:00
Nate E TeBlunthuis
4dd9a210e6 bugfix in retrieving old data and rename file. 2020-07-03 13:54:55 -07:00
Nate E TeBlunthuis
c972d828b3 Script for checking shas for submissions. 2020-07-03 13:35:46 -07:00
Nate E TeBlunthuis
7da18e33ba Bugfix: use timestamp types
Also change the canonical file path.
2020-07-03 11:38:43 -07:00
Nate E TeBlunthuis
db2d6248fc update the reddit comment dumps 2020-07-03 10:41:13 -07:00
Nate E TeBlunthuis
d05da6441f Don't clobber old dumps so that we can just download the new ones. 2020-07-03 10:40:43 -07:00
Nate E TeBlunthuis
592d2c7dda script for getting submissions dumps from pushshift. 2020-07-02 17:40:17 -07:00
Nate E TeBlunthuis
64e9408a65 Extract variables from pushshift comment to parquet
A spark script
2020-07-02 14:35:55 -07:00