13
0
Go to file
2020-08-04 13:24:37 -07:00
examples clean up comments in streaming example. 2020-07-07 12:28:57 -07:00
.gitignore update .gitignore 2020-07-07 12:28:44 -07:00
check_comments_shas.py Check the shas when we download dumps 2020-07-06 23:31:52 -07:00
check_submission_shas.py Script for checking shas for submissions. 2020-07-03 13:35:46 -07:00
comments_2_parquet_part1.py Build comments dataset similarly to submissions and improve partitioning scheme 2020-07-07 11:45:43 -07:00
comments_2_parquet_part2.py Build comments dataset similarly to submissions and improve partitioning scheme 2020-07-07 11:45:43 -07:00
comments_2_parquet.sh Bugfixes in scripts. 2020-07-07 23:29:36 -07:00
helper.py Bugfixes in scripts. 2020-07-07 23:29:36 -07:00
pull_pushshift_comments.sh Check the shas when we download dumps 2020-07-06 23:31:52 -07:00
pull_pushshift_submissions.sh Check the shas when we download dumps 2020-07-06 23:31:52 -07:00
sort_tf_comments.py code to sort tf 2020-08-03 17:56:36 -07:00
submissions_2_parquet_part1.py remove is_submitter field from submissions which doesn't exist. 2020-07-09 17:12:14 -07:00
submissions_2_parquet_part2.py Bugfixes in scripts. 2020-07-07 23:29:36 -07:00
submissions_2_parquet.sh Build comments dataset similarly to submissions and improve partitioning scheme 2020-07-07 11:45:43 -07:00
tf_reddit_comments.py Improve tokenization following data. Generate author counts. 2020-08-04 13:24:37 -07:00