1
0
Commit Graph

187 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
2e5181602b bugfix. 2024-11-27 19:53:04 -08:00
Nathan TeBlunthuis
0d7f4d3cec pass through stopWords. 2024-11-27 19:33:28 -08:00
Nathan TeBlunthuis
5d48c0eb55 pass through mwe_tokenize 2024-11-27 19:31:59 -08:00
Nathan TeBlunthuis
91cc1edf02 pass through mwe_pass 2024-11-27 19:20:49 -08:00
Nathan TeBlunthuis
2decdc9750 move function to outer scope. 2024-11-27 19:13:49 -08:00
Nathan TeBlunthuis
7da046735b move function to outer scope. 2024-11-27 19:10:34 -08:00
Nathan TeBlunthuis
0631256956 make the output directory. 2024-11-27 19:06:24 -08:00
Nathan TeBlunthuis
8cb9683bc2 bugfix 2024-11-27 19:03:52 -08:00
Nathan TeBlunthuis
587e1c0022 bugfix. 2024-11-27 18:56:22 -08:00
Nathan TeBlunthuis
78eb16f4d6 more path munging. 2024-11-27 18:53:16 -08:00
Nathan TeBlunthuis
a0a6a08bf2 handle case where we're in a parent directory. 2024-11-27 18:49:03 -08:00
Nathan TeBlunthuis
a84b633641 add absolute path to call. 2024-11-27 18:42:29 -08:00
Nathan TeBlunthuis
ce7b5f92eb bugfix. 2024-11-27 17:20:04 -08:00
Nathan TeBlunthuis
fbf905c740 rename file 2024-11-27 11:55:31 -08:00
Nathan TeBlunthuis
dd894ebf61 support posts in ngrams 2024-11-27 11:51:22 -08:00
9345f9de94 make pass keyword arg to dataframe.drop 2023-05-31 09:47:21 -07:00
07b0dff9bc changes for archiving. 2023-05-23 17:18:19 -07:00
811a0d87c4 changes from dirty branch. 2023-05-18 10:29:08 -07:00
c190791364 add 2 more umap parameters 2022-06-08 17:27:37 -07:00
5a40465a62 add support for umap->hdbscan clustering method 2022-06-08 17:01:27 -07:00
55b75ea6fc Merge remote-tracking branch 'refs/remotes/origin/excise_reindex' into excise_reindex 2022-04-06 11:14:13 -07:00
197518a222 git-annex in 2022-04-06 11:11:11 -07:00
53f5b8c03c add note to try other tf normalization strategies. 2022-03-31 12:17:16 -07:00
65deba5e4e Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex 2022-01-19 14:01:44 -08:00
7b130a30af commit changes from smap project. 2022-01-19 13:57:02 -08:00
98c1317af5 update pushshift dumps. 2021-12-10 21:23:32 -08:00
541e125b28 lsi support for weekly similarities 2021-08-11 22:48:33 -07:00
b7c39a3494 Merge branch 'master' of code:cdsc_reddit into excise_reindex 2021-08-03 15:13:39 -07:00
ce549c6c97 Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex 2021-08-03 15:13:21 -07:00
6e43294a41 Updates to similarities code for smap project. 2021-08-03 15:06:48 -07:00
14ab979f59 Merge branch 'master' of code:cdsc_reddit 2021-08-03 15:03:40 -07:00
2d21ff1137 Merge branch 'master' of code:cdsc_reddit into excise_reindex 2021-08-03 15:02:08 -07:00
Nate E TeBlunthuis
cf86c7492c update clustering scripts 2021-08-03 14:55:02 -07:00
Nate E TeBlunthuis
c6122bb429 Merge branch 'master' of code:cdsc_reddit 2021-07-28 15:32:21 -07:00
Nate E TeBlunthuis
596e1ff339 no longer do we need to get daily dumps 2021-07-28 15:32:04 -07:00
Nate E TeBlunthuis
87ffaa6858 script for picking the best clustering given constraints 2021-05-14 19:10:36 -07:00
Nate E TeBlunthuis
7b14db67de Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex 2021-05-13 22:28:31 -07:00
Nate E TeBlunthuis
0b95bea30e support isolates in visualization 2021-05-13 22:26:58 -07:00
Nate E TeBlunthuis
582cf263ea bug fix in affinity clustering 2021-05-13 22:26:15 -07:00
Nate E TeBlunthuis
8a2248fae1 Merge remote-tracking branch 'origin/excise_reindex' into temp 2021-05-10 18:32:03 -07:00
Nate E TeBlunthuis
47ba04aa97 add script for pulling cluster timeseries 2021-05-10 18:24:22 -07:00
Nate E TeBlunthuis
4cb7eeec80 Refactor to make a decent api. 2021-05-10 13:46:49 -07:00
Nate E TeBlunthuis
f05cb962e0 refactor clustring in object oriented style 2021-05-07 22:33:26 -07:00
Nate E TeBlunthuis
8d1df5b26e refactor clustering.py into method-specific files. 2021-05-03 11:28:48 -07:00
Nate E TeBlunthuis
e1c9d9af6f Remove 'exclude phrases' parameter. 2021-05-03 10:37:09 -07:00
Nate E TeBlunthuis
7df8436067 Use Latent semantic indexing and hdbscan 2021-05-02 23:39:55 -07:00
Nate E TeBlunthuis
36b24ee933 reindex tfidf in memory instead of using spark 2021-04-30 12:48:19 -07:00
Nate E TeBlunthuis
6a3bfa26ee bugfix 2021-04-26 22:31:05 -07:00
Nate E TeBlunthuis
3a758f1fc8 Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch 2021-04-26 13:58:25 -07:00
Nate E TeBlunthuis
806cfc948f support passing in list of tfidf vectors.
Also lowercases included subreddits.
2021-04-26 13:20:43 -07:00