Nate E TeBlunthuis
|
4dd9a210e6
|
bugfix in retrieving old data and rename file.
|
2020-07-03 13:54:55 -07:00 |
|
Nate E TeBlunthuis
|
c972d828b3
|
Script for checking shas for submissions.
|
2020-07-03 13:35:46 -07:00 |
|
Nate E TeBlunthuis
|
7da18e33ba
|
Bugfix: use timestamp types
Also change the canonical file path.
|
2020-07-03 11:38:43 -07:00 |
|
Nate E TeBlunthuis
|
db2d6248fc
|
update the reddit comment dumps
|
2020-07-03 10:41:13 -07:00 |
|
Nate E TeBlunthuis
|
d05da6441f
|
Don't clobber old dumps so that we can just download the new ones.
|
2020-07-03 10:40:43 -07:00 |
|
Nate E TeBlunthuis
|
592d2c7dda
|
script for getting submissions dumps from pushshift.
|
2020-07-02 17:40:17 -07:00 |
|
Nate E TeBlunthuis
|
64e9408a65
|
Extract variables from pushshift comment to parquet
A spark script
|
2020-07-02 14:35:55 -07:00 |
|