1
0
Commit Graph

29 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
d0f37fe33a limit output to only the subreddits in clusters. 2025-01-11 19:52:54 -08:00
Nathan TeBlunthuis
17defcd163 bugfix. 2025-01-11 19:07:45 -08:00
Nathan TeBlunthuis
0613193e9d support passing in a model object. 2025-01-11 18:59:25 -08:00
Nathan TeBlunthuis
79d1826ba4 enforce min_df constraint in counting lsi features. 2024-12-30 16:17:31 -08:00
Nathan TeBlunthuis
3555542862 use min/max df constraints in counting nterms. 2024-12-30 16:10:50 -08:00
Nathan TeBlunthuis
41fea31fce bugfix 2024-12-28 20:04:38 -08:00
Nathan TeBlunthuis
7aa22c7385 bugfix 2024-12-28 20:02:24 -08:00
Nathan TeBlunthuis
f11d4cfc72 use static tfidf (not weekly) to create tfidf matrix 2024-12-28 20:00:53 -08:00
Nathan TeBlunthuis
7b5ac73b2c use static tfidf (not weekly) to create tfidf matrix 2024-12-28 19:58:14 -08:00
Nathan TeBlunthuis
e2e7d7dbb1 more print debugging 2024-12-28 19:27:42 -08:00
Nathan TeBlunthuis
c317ef6475 debugging: print the shape 2024-12-28 19:21:24 -08:00
Nathan TeBlunthuis
c3cce0817e bugfix 2024-12-28 14:31:24 -08:00
Nathan TeBlunthuis
c9464f86f7 interface fix. 2024-12-28 14:27:56 -08:00
Nathan TeBlunthuis
f3db4efbb1 pass nterms as int. 2024-12-28 14:24:24 -08:00
Nathan TeBlunthuis
27f29e63fa typo fix. 2024-12-28 14:18:58 -08:00
Nathan TeBlunthuis
3f277ad99e pass weeks as strings. 2024-12-28 14:10:55 -08:00
Nathan TeBlunthuis
02ec11f726 no longer need to convert from spark dates into isoformat. 2024-12-28 13:55:54 -08:00
Nathan TeBlunthuis
104b708ff6 use duckdb not spark to prepare for weekly similarities. 2024-12-28 13:45:17 -08:00
Nathan TeBlunthuis
74ee86e443 add weekly_cosine_similarities script. 2024-12-25 21:15:38 -08:00
07b0dff9bc changes for archiving. 2023-05-23 17:18:19 -07:00
7b130a30af commit changes from smap project. 2022-01-19 13:57:02 -08:00
541e125b28 lsi support for weekly similarities 2021-08-11 22:48:33 -07:00
6e43294a41 Updates to similarities code for smap project. 2021-08-03 15:06:48 -07:00
Nate E TeBlunthuis
7df8436067 Use Latent semantic indexing and hdbscan 2021-05-02 23:39:55 -07:00
Nate E TeBlunthuis
003a48aea5 bugfix in weekly similarities 2021-04-22 10:37:04 -07:00
Nate E TeBlunthuis
f0176d9f0d Changes for cosine similarities on klone. 2021-04-05 23:21:06 -07:00
Nate E TeBlunthuis
4e20dce188 Updating to support wang-style user overlaps. 2020-12-24 22:38:04 -08:00
Nate E TeBlunthuis
56269deee3 Some improvements to run affinity clustering on larger dataset and
compute density.
2020-12-12 20:42:47 -08:00
Nate E TeBlunthuis
e6294b5b90 Refactor and reorganze. 2020-12-08 17:32:20 -08:00