Nathan TeBlunthuis
|
355d014d5f
|
pass path into tfidf function.
|
2024-12-02 08:03:19 -08:00 |
|
Nathan TeBlunthuis
|
5a131053af
|
spark config tweaks.
|
2024-12-01 15:41:47 -08:00 |
|
Nathan TeBlunthuis
|
224fb89317
|
bugfix.
|
2024-12-01 15:28:25 -08:00 |
|
Nathan TeBlunthuis
|
b25c332cea
|
typo fix.
|
2024-12-01 15:27:16 -08:00 |
|
Nathan TeBlunthuis
|
613059737a
|
set os environment for big machine
|
2024-12-01 15:25:18 -08:00 |
|
Nathan TeBlunthuis
|
abe217d2d5
|
fix configuration code
|
2024-12-01 15:21:51 -08:00 |
|
Nathan TeBlunthuis
|
9911f758f9
|
set memory usage.
|
2024-12-01 14:55:38 -08:00 |
|
Nathan TeBlunthuis
|
a31d8b26eb
|
correct tf_name
|
2024-12-01 14:38:48 -08:00 |
|
Nathan TeBlunthuis
|
e40cc45d40
|
bugfix.
|
2024-12-01 14:10:47 -08:00 |
|
Nathan TeBlunthuis
|
d61746c9f7
|
make the output authors path.
|
2024-12-01 13:58:13 -08:00 |
|
Nathan TeBlunthuis
|
9df9a8b8ff
|
rename function.
|
2024-12-01 13:44:19 -08:00 |
|
Nathan TeBlunthuis
|
3fea1f9388
|
sort and partition the term frequencies using spark.
|
2024-12-01 13:42:13 -08:00 |
|
Nathan TeBlunthuis
|
2b023fea8d
|
bugfix
|
2024-12-01 09:58:09 -08:00 |
|
Nathan TeBlunthuis
|
88fca0f82b
|
allow posts schemas to be nullable.
|
2024-12-01 09:55:12 -08:00 |
|
Nathan TeBlunthuis
|
271cbea7d9
|
add a 'limit' parameter for testing.
|
2024-12-01 09:51:49 -08:00 |
|
Nathan TeBlunthuis
|
4218bf864b
|
debugging.
|
2024-12-01 09:39:50 -08:00 |
|
Nathan TeBlunthuis
|
22d6a6961c
|
allow authors to be null in submissions.
|
2024-11-27 20:04:05 -08:00 |
|
Nathan TeBlunthuis
|
a5ca25dd6e
|
bugfix.
|
2024-11-27 19:56:06 -08:00 |
|
Nathan TeBlunthuis
|
2e5181602b
|
bugfix.
|
2024-11-27 19:53:04 -08:00 |
|
Nathan TeBlunthuis
|
0d7f4d3cec
|
pass through stopWords.
|
2024-11-27 19:33:28 -08:00 |
|
Nathan TeBlunthuis
|
5d48c0eb55
|
pass through mwe_tokenize
|
2024-11-27 19:31:59 -08:00 |
|
Nathan TeBlunthuis
|
91cc1edf02
|
pass through mwe_pass
|
2024-11-27 19:20:49 -08:00 |
|
Nathan TeBlunthuis
|
2decdc9750
|
move function to outer scope.
|
2024-11-27 19:13:49 -08:00 |
|
Nathan TeBlunthuis
|
7da046735b
|
move function to outer scope.
|
2024-11-27 19:10:34 -08:00 |
|
Nathan TeBlunthuis
|
0631256956
|
make the output directory.
|
2024-11-27 19:06:24 -08:00 |
|
Nathan TeBlunthuis
|
8cb9683bc2
|
bugfix
|
2024-11-27 19:03:52 -08:00 |
|
Nathan TeBlunthuis
|
587e1c0022
|
bugfix.
|
2024-11-27 18:56:22 -08:00 |
|
Nathan TeBlunthuis
|
78eb16f4d6
|
more path munging.
|
2024-11-27 18:53:16 -08:00 |
|
Nathan TeBlunthuis
|
a0a6a08bf2
|
handle case where we're in a parent directory.
|
2024-11-27 18:49:03 -08:00 |
|
Nathan TeBlunthuis
|
a84b633641
|
add absolute path to call.
|
2024-11-27 18:42:29 -08:00 |
|
Nathan TeBlunthuis
|
ce7b5f92eb
|
bugfix.
|
2024-11-27 17:20:04 -08:00 |
|
Nathan TeBlunthuis
|
fbf905c740
|
rename file
|
2024-11-27 11:55:31 -08:00 |
|
Nathan TeBlunthuis
|
dd894ebf61
|
support posts in ngrams
|
2024-11-27 11:51:22 -08:00 |
|
|
53f5b8c03c
|
add note to try other tf normalization strategies.
|
2022-03-31 12:17:16 -07:00 |
|
|
14ab979f59
|
Merge branch 'master' of code:cdsc_reddit
|
2021-08-03 15:03:40 -07:00 |
|
Nate E TeBlunthuis
|
c6122bb429
|
Merge branch 'master' of code:cdsc_reddit
|
2021-07-28 15:32:21 -07:00 |
|
Nate E TeBlunthuis
|
596e1ff339
|
no longer do we need to get daily dumps
|
2021-07-28 15:32:04 -07:00 |
|
Nate E TeBlunthuis
|
6a3bfa26ee
|
bugfix
|
2021-04-26 22:31:05 -07:00 |
|
Nate E TeBlunthuis
|
3a758f1fc8
|
Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch
|
2021-04-26 13:58:25 -07:00 |
|
Nate E TeBlunthuis
|
806cfc948f
|
support passing in list of tfidf vectors.
Also lowercases included subreddits.
|
2021-04-26 13:20:43 -07:00 |
|
Nate E TeBlunthuis
|
0fe120e4ab
|
support passing in list of tfidf vectors.
Also lowercases included subreddits.
|
2021-04-26 11:44:56 -07:00 |
|
Nate E TeBlunthuis
|
f20365c07e
|
Merge branch 'master' of code:cdsc_reddit
|
2021-04-22 10:46:26 -07:00 |
|
Nate E TeBlunthuis
|
34e0a0a30d
|
version of weekly_cosine_similarities.py from klone
|
2021-04-22 10:38:10 -07:00 |
|
Nate E TeBlunthuis
|
003a48aea5
|
bugfix in weekly similarities
|
2021-04-22 10:37:04 -07:00 |
|
Nate E TeBlunthuis
|
37dd0ef55f
|
bugfixes in clustering selection.
|
2021-04-21 16:56:25 -07:00 |
|
Nate E TeBlunthuis
|
ac06a8757a
|
calculate some user-level attributes to detect bots
|
2021-04-20 11:34:36 -07:00 |
|
Nate E TeBlunthuis
|
01a4c35358
|
grid sweep selection for clustering hyperparameters
|
2021-04-20 11:33:54 -07:00 |
|
Nate E TeBlunthuis
|
628a70734b
|
Merge branch 'master' of code:cdsc_reddit
|
2021-04-05 23:21:35 -07:00 |
|
Nate E TeBlunthuis
|
f0176d9f0d
|
Changes for cosine similarities on klone.
|
2021-04-05 23:21:06 -07:00 |
|
Nate E TeBlunthuis
|
36cb0a5546
|
add code for pulling activity time series from parquet.
|
2021-03-24 16:08:57 -07:00 |
|