13
0
Commit Graph

114 Commits

Author SHA1 Message Date
65deba5e4e Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex 2022-01-19 14:01:44 -08:00
7b130a30af commit changes from smap project. 2022-01-19 13:57:02 -08:00
98c1317af5 update pushshift dumps. 2021-12-10 21:23:32 -08:00
541e125b28 lsi support for weekly similarities 2021-08-11 22:48:33 -07:00
b7c39a3494 Merge branch 'master' of code:cdsc_reddit into excise_reindex 2021-08-03 15:13:39 -07:00
ce549c6c97 Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex 2021-08-03 15:13:21 -07:00
6e43294a41 Updates to similarities code for smap project. 2021-08-03 15:06:48 -07:00
14ab979f59 Merge branch 'master' of code:cdsc_reddit 2021-08-03 15:03:40 -07:00
2d21ff1137 Merge branch 'master' of code:cdsc_reddit into excise_reindex 2021-08-03 15:02:08 -07:00
Nate E TeBlunthuis
cf86c7492c update clustering scripts 2021-08-03 14:55:02 -07:00
Nate E TeBlunthuis
c6122bb429 Merge branch 'master' of code:cdsc_reddit 2021-07-28 15:32:21 -07:00
Nate E TeBlunthuis
596e1ff339 no longer do we need to get daily dumps 2021-07-28 15:32:04 -07:00
Nate E TeBlunthuis
87ffaa6858 script for picking the best clustering given constraints 2021-05-14 19:10:36 -07:00
Nate E TeBlunthuis
7b14db67de Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex 2021-05-13 22:28:31 -07:00
Nate E TeBlunthuis
0b95bea30e support isolates in visualization 2021-05-13 22:26:58 -07:00
Nate E TeBlunthuis
582cf263ea bug fix in affinity clustering 2021-05-13 22:26:15 -07:00
Nate E TeBlunthuis
8a2248fae1 Merge remote-tracking branch 'origin/excise_reindex' into temp 2021-05-10 18:32:03 -07:00
Nate E TeBlunthuis
47ba04aa97 add script for pulling cluster timeseries 2021-05-10 18:24:22 -07:00
Nate E TeBlunthuis
4cb7eeec80 Refactor to make a decent api. 2021-05-10 13:46:49 -07:00
Nate E TeBlunthuis
f05cb962e0 refactor clustring in object oriented style 2021-05-07 22:33:26 -07:00
Nate E TeBlunthuis
8d1df5b26e refactor clustering.py into method-specific files. 2021-05-03 11:28:48 -07:00
Nate E TeBlunthuis
e1c9d9af6f Remove 'exclude phrases' parameter. 2021-05-03 10:37:09 -07:00
Nate E TeBlunthuis
7df8436067 Use Latent semantic indexing and hdbscan 2021-05-02 23:39:55 -07:00
Nate E TeBlunthuis
36b24ee933 reindex tfidf in memory instead of using spark 2021-04-30 12:48:19 -07:00
Nate E TeBlunthuis
6a3bfa26ee bugfix 2021-04-26 22:31:05 -07:00
Nate E TeBlunthuis
3a758f1fc8 Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch 2021-04-26 13:58:25 -07:00
Nate E TeBlunthuis
806cfc948f support passing in list of tfidf vectors.
Also lowercases included subreddits.
2021-04-26 13:20:43 -07:00
Nate E TeBlunthuis
0fe120e4ab support passing in list of tfidf vectors.
Also lowercases included subreddits.
2021-04-26 11:44:56 -07:00
Nate E TeBlunthuis
f20365c07e Merge branch 'master' of code:cdsc_reddit 2021-04-22 10:46:26 -07:00
Nate E TeBlunthuis
34e0a0a30d version of weekly_cosine_similarities.py from klone 2021-04-22 10:38:10 -07:00
Nate E TeBlunthuis
003a48aea5 bugfix in weekly similarities 2021-04-22 10:37:04 -07:00
Nate E TeBlunthuis
37dd0ef55f bugfixes in clustering selection. 2021-04-21 16:56:25 -07:00
Nate E TeBlunthuis
ac06a8757a calculate some user-level attributes to detect bots 2021-04-20 11:34:36 -07:00
Nate E TeBlunthuis
01a4c35358 grid sweep selection for clustering hyperparameters 2021-04-20 11:33:54 -07:00
Nate E TeBlunthuis
628a70734b Merge branch 'master' of code:cdsc_reddit 2021-04-05 23:21:35 -07:00
Nate E TeBlunthuis
f0176d9f0d Changes for cosine similarities on klone. 2021-04-05 23:21:06 -07:00
Nate E TeBlunthuis
a013f6718b export timeseries functions 2021-03-24 17:18:30 -07:00
Nate E TeBlunthuis
36cb0a5546 add code for pulling activity time series from parquet. 2021-03-24 16:08:57 -07:00
Nate E TeBlunthuis
06430903f0 add included_subreddits parameter to cosine similarities. 2021-02-22 18:38:34 -08:00
Nate E TeBlunthuis
4dc949de5f Changes from hyak. 2021-02-22 16:03:48 -08:00
Nate E TeBlunthuis
140d1bdd17 fix bug in viz. 2021-01-27 20:26:15 -08:00
Nate E TeBlunthuis
554660275f add visualization for 10000 subreddits based on author-tf similarities. 2021-01-27 20:22:24 -08:00
Nate E TeBlunthuis
b4dd9acbd8 Merge branch 'master' of code:cdsc_reddit 2021-01-27 20:09:23 -08:00
dbe4c87f8b add cluster selection to visualization 2021-01-27 20:08:07 -08:00
Nate E TeBlunthuis
3155600514 remove nsfw subs from topN 2020-12-28 21:11:44 -08:00
Nate E TeBlunthuis
4e20dce188 Updating to support wang-style user overlaps. 2020-12-24 22:38:04 -08:00
Nate E TeBlunthuis
56269deee3 Some improvements to run affinity clustering on larger dataset and
compute density.
2020-12-12 20:42:47 -08:00
Nate E TeBlunthuis
e6294b5b90 Refactor and reorganze. 2020-12-08 17:32:20 -08:00
Nate E TeBlunthuis
a60747292e Add code for running tf-idf at the weekly level. 2020-12-01 22:54:48 -08:00
db5879d6c9 refactor visualization code. 2020-11-17 16:46:49 -08:00