1
0
Commit Graph

236 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
9590e18a07 bugfix in filling in missing 2025-01-12 14:27:58 -08:00
Nathan TeBlunthuis
a7182ff3dc print debugging 2025-01-12 09:44:25 -08:00
Nathan TeBlunthuis
31aaa03079 add flag to run without overwriting completed parts. 2025-01-12 09:40:52 -08:00
Nathan TeBlunthuis
fcdd2d2272 bugfix 2025-01-12 02:39:05 -08:00
Nathan TeBlunthuis
9ae2d13573 bugfix 2025-01-12 01:17:21 -08:00
Nathan TeBlunthuis
3792f58d15 print debugging 2025-01-12 01:12:05 -08:00
Nathan TeBlunthuis
a9a4a6d90b bugfix 2025-01-12 01:07:57 -08:00
Nathan TeBlunthuis
1e2eeadb60 typo fix 2025-01-12 01:06:08 -08:00
Nathan TeBlunthuis
a9711fddf5 set nterms based on the new database 2025-01-12 01:03:52 -08:00
Nathan TeBlunthuis
f79eb28e31 fix f-string 2025-01-12 00:56:17 -08:00
Nathan TeBlunthuis
1fa2f6c4d2 bugfix 2025-01-12 00:54:34 -08:00
Nathan TeBlunthuis
8f0ce2dba7 bugfix 2025-01-12 00:52:38 -08:00
Nathan TeBlunthuis
2b4cb7fdf6 bugfix 2025-01-12 00:49:36 -08:00
Nathan TeBlunthuis
e568ee6db7 add parameters. 2025-01-12 00:47:47 -08:00
Nathan TeBlunthuis
b4f9ce0ad2 support remapping term_ids. 2025-01-12 00:44:16 -08:00
Nathan TeBlunthuis
72a4e686ef bugfix 2025-01-11 22:59:20 -08:00
Nathan TeBlunthuis
9c6d7429b2 fix bug. 2025-01-11 22:46:43 -08:00
Nathan TeBlunthuis
4c2ddc7455 bugfix 2025-01-11 21:50:07 -08:00
Nathan TeBlunthuis
1453a57d68 bugfix 2025-01-11 21:36:48 -08:00
Nathan TeBlunthuis
561a6704a3 make multiproc configurable 2025-01-11 21:21:53 -08:00
Nathan TeBlunthuis
b2f1c1342f tweak parallelism in hopes for speed. 2025-01-11 20:22:18 -08:00
Nathan TeBlunthuis
4168d0d4cf pass clusters param through 2025-01-11 20:09:19 -08:00
Nathan TeBlunthuis
dba0faf125 bugfix 2025-01-11 20:02:36 -08:00
Nathan TeBlunthuis
d0f37fe33a limit output to only the subreddits in clusters. 2025-01-11 19:52:54 -08:00
Nathan TeBlunthuis
9892315234 bugfix 2025-01-11 19:12:01 -08:00
Nathan TeBlunthuis
17defcd163 bugfix. 2025-01-11 19:07:45 -08:00
Nathan TeBlunthuis
ecc50f0249 spelling fix. 2025-01-11 18:59:42 -08:00
Nathan TeBlunthuis
0613193e9d support passing in a model object. 2025-01-11 18:59:25 -08:00
Nathan TeBlunthuis
3c1d5df97e add submissions to timeseries. 2025-01-10 06:20:38 -08:00
Nathan TeBlunthuis
81e12d1cef bugfix. 2024-12-31 14:41:27 -08:00
Nathan TeBlunthuis
c59d251d19 write clusters and read with spark instead of creating data frame. 2024-12-31 14:37:50 -08:00
Nathan TeBlunthuis
a8a86c2440 add timeseries code 2024-12-31 16:27:04 -06:00
Nathan TeBlunthuis
79d1826ba4 enforce min_df constraint in counting lsi features. 2024-12-30 16:17:31 -08:00
Nathan TeBlunthuis
3555542862 use min/max df constraints in counting nterms. 2024-12-30 16:10:50 -08:00
Nathan TeBlunthuis
a9b296dd73 bugfix 2024-12-28 20:18:53 -08:00
Nathan TeBlunthuis
d9db21686d remove unnecessary isoformat 2024-12-28 20:08:12 -08:00
Nathan TeBlunthuis
41fea31fce bugfix 2024-12-28 20:04:38 -08:00
Nathan TeBlunthuis
7aa22c7385 bugfix 2024-12-28 20:02:24 -08:00
Nathan TeBlunthuis
f11d4cfc72 use static tfidf (not weekly) to create tfidf matrix 2024-12-28 20:00:53 -08:00
Nathan TeBlunthuis
7b5ac73b2c use static tfidf (not weekly) to create tfidf matrix 2024-12-28 19:58:14 -08:00
Nathan TeBlunthuis
e2e7d7dbb1 more print debugging 2024-12-28 19:27:42 -08:00
Nathan TeBlunthuis
c317ef6475 debugging: print the shape 2024-12-28 19:21:24 -08:00
Nathan TeBlunthuis
c3cce0817e bugfix 2024-12-28 14:31:24 -08:00
Nathan TeBlunthuis
c9464f86f7 interface fix. 2024-12-28 14:27:56 -08:00
Nathan TeBlunthuis
f3db4efbb1 pass nterms as int. 2024-12-28 14:24:24 -08:00
Nathan TeBlunthuis
27f29e63fa typo fix. 2024-12-28 14:18:58 -08:00
Nathan TeBlunthuis
3f277ad99e pass weeks as strings. 2024-12-28 14:10:55 -08:00
Nathan TeBlunthuis
02ec11f726 no longer need to convert from spark dates into isoformat. 2024-12-28 13:55:54 -08:00
Nathan TeBlunthuis
104b708ff6 use duckdb not spark to prepare for weekly similarities. 2024-12-28 13:45:17 -08:00
Nathan TeBlunthuis
74ee86e443 add weekly_cosine_similarities script. 2024-12-25 21:15:38 -08:00