1
0

Commit Graph

  • 9590e18a07 bugfix in filling in missing master Nathan TeBlunthuis 2025-01-12 14:27:58 -0800
  • a7182ff3dc print debugging Nathan TeBlunthuis 2025-01-12 09:44:25 -0800
  • 31aaa03079 add flag to run without overwriting completed parts. Nathan TeBlunthuis 2025-01-12 09:40:52 -0800
  • fcdd2d2272 bugfix Nathan TeBlunthuis 2025-01-12 02:39:05 -0800
  • 9ae2d13573 bugfix Nathan TeBlunthuis 2025-01-12 01:17:21 -0800
  • 3792f58d15 print debugging Nathan TeBlunthuis 2025-01-12 01:12:05 -0800
  • a9a4a6d90b bugfix Nathan TeBlunthuis 2025-01-12 01:07:57 -0800
  • 1e2eeadb60 typo fix Nathan TeBlunthuis 2025-01-12 01:06:08 -0800
  • a9711fddf5 set nterms based on the new database Nathan TeBlunthuis 2025-01-12 01:03:52 -0800
  • f79eb28e31 fix f-string Nathan TeBlunthuis 2025-01-12 00:56:17 -0800
  • 1fa2f6c4d2 bugfix Nathan TeBlunthuis 2025-01-12 00:54:34 -0800
  • 8f0ce2dba7 bugfix Nathan TeBlunthuis 2025-01-12 00:52:38 -0800
  • 2b4cb7fdf6 bugfix Nathan TeBlunthuis 2025-01-12 00:49:36 -0800
  • e568ee6db7 add parameters. Nathan TeBlunthuis 2025-01-12 00:47:47 -0800
  • b4f9ce0ad2 support remapping term_ids. Nathan TeBlunthuis 2025-01-12 00:44:16 -0800
  • 72a4e686ef bugfix Nathan TeBlunthuis 2025-01-11 22:59:20 -0800
  • 9c6d7429b2 fix bug. Nathan TeBlunthuis 2025-01-11 22:46:43 -0800
  • 4c2ddc7455 bugfix Nathan TeBlunthuis 2025-01-11 21:50:07 -0800
  • 1453a57d68 bugfix Nathan TeBlunthuis 2025-01-11 21:36:48 -0800
  • 561a6704a3 make multiproc configurable Nathan TeBlunthuis 2025-01-11 21:21:53 -0800
  • b2f1c1342f tweak parallelism in hopes for speed. Nathan TeBlunthuis 2025-01-11 20:22:18 -0800
  • 4168d0d4cf pass clusters param through Nathan TeBlunthuis 2025-01-11 20:09:19 -0800
  • dba0faf125 bugfix Nathan TeBlunthuis 2025-01-11 20:02:36 -0800
  • d0f37fe33a limit output to only the subreddits in clusters. Nathan TeBlunthuis 2025-01-11 19:52:54 -0800
  • 9892315234 bugfix Nathan TeBlunthuis 2025-01-11 19:12:01 -0800
  • 17defcd163 bugfix. Nathan TeBlunthuis 2025-01-11 19:07:45 -0800
  • ecc50f0249 spelling fix. Nathan TeBlunthuis 2025-01-11 18:59:42 -0800
  • 0613193e9d support passing in a model object. Nathan TeBlunthuis 2025-01-11 18:57:02 -0800
  • 3c1d5df97e add submissions to timeseries. Nathan TeBlunthuis 2025-01-10 06:20:38 -0800
  • 81e12d1cef bugfix. Nathan TeBlunthuis 2024-12-31 14:41:27 -0800
  • c59d251d19 write clusters and read with spark instead of creating data frame. Nathan TeBlunthuis 2024-12-31 14:37:50 -0800
  • a8a86c2440 add timeseries code Nathan TeBlunthuis 2024-12-31 16:27:04 -0600
  • 79d1826ba4 enforce min_df constraint in counting lsi features. Nathan TeBlunthuis 2024-12-30 16:17:31 -0800
  • 3555542862 use min/max df constraints in counting nterms. Nathan TeBlunthuis 2024-12-30 16:10:50 -0800
  • a9b296dd73 bugfix Nathan TeBlunthuis 2024-12-28 20:18:53 -0800
  • d9db21686d remove unnecessary isoformat Nathan TeBlunthuis 2024-12-28 20:08:12 -0800
  • 41fea31fce bugfix Nathan TeBlunthuis 2024-12-28 20:04:38 -0800
  • 7aa22c7385 bugfix Nathan TeBlunthuis 2024-12-28 20:02:24 -0800
  • f11d4cfc72 use static tfidf (not weekly) to create tfidf matrix Nathan TeBlunthuis 2024-12-28 20:00:53 -0800
  • 7b5ac73b2c use static tfidf (not weekly) to create tfidf matrix Nathan TeBlunthuis 2024-12-28 19:58:14 -0800
  • e2e7d7dbb1 more print debugging Nathan TeBlunthuis 2024-12-28 19:27:42 -0800
  • c317ef6475 debugging: print the shape Nathan TeBlunthuis 2024-12-28 19:21:24 -0800
  • c3cce0817e bugfix Nathan TeBlunthuis 2024-12-28 14:31:24 -0800
  • c9464f86f7 interface fix. Nathan TeBlunthuis 2024-12-28 14:27:56 -0800
  • f3db4efbb1 pass nterms as int. Nathan TeBlunthuis 2024-12-28 14:24:24 -0800
  • 27f29e63fa typo fix. Nathan TeBlunthuis 2024-12-28 14:18:58 -0800
  • 3f277ad99e pass weeks as strings. Nathan TeBlunthuis 2024-12-28 14:10:55 -0800
  • 02ec11f726 no longer need to convert from spark dates into isoformat. Nathan TeBlunthuis 2024-12-28 13:55:54 -0800
  • 104b708ff6 use duckdb not spark to prepare for weekly similarities. Nathan TeBlunthuis 2024-12-28 13:45:17 -0800
  • 74ee86e443 add weekly_cosine_similarities script. Nathan TeBlunthuis 2024-12-25 21:15:38 -0800
  • a8a92d30df bugfix Nathan TeBlunthuis 2024-12-19 23:34:55 -0800
  • 638ab78375 comment out config. Nathan TeBlunthuis 2024-12-19 23:32:16 -0800
  • 8cb75c8354 typo fix. Nathan TeBlunthuis 2024-12-19 20:10:34 -0800
  • 0bbdc6bd5e typo fix. Nathan TeBlunthuis 2024-12-19 20:09:00 -0800
  • 8b69801c8d correct number of partitions. Nathan TeBlunthuis 2024-12-19 19:39:18 -0800
  • 189330198c repartition for parallelism. Nathan TeBlunthuis 2024-12-19 17:53:27 -0800
  • c6c9ec173b add shebang Nathan TeBlunthuis 2024-12-15 18:47:07 -0800
  • 52694e0498 typofix Nathan TeBlunthuis 2024-12-15 08:23:06 -0800
  • cb2f2c9717 make executable. Nathan TeBlunthuis 2024-12-15 08:18:42 -0800
  • 9a852b9300 was renamed to 'term_frequencies' prior to merge. Nathan TeBlunthuis 2024-12-12 07:54:28 -0800
  • 3d192ab82f Merge remote-tracking branch 'origin/icwsm_dataverse' Nathan TeBlunthuis 2024-12-12 07:45:06 -0800
  • e2b6c1b481 configure to use the g2-cpu node. Nathan TeBlunthuis 2024-12-12 07:17:10 -0800
  • f38ec6c129 smaller outchunk size. Nathan TeBlunthuis 2024-12-07 13:23:44 -0800
  • 25bfc57baf change path Nathan TeBlunthuis 2024-12-06 08:18:20 -0800
  • c3d2834110 use pyarrow instead of spark to write data Nathan TeBlunthuis 2024-12-06 08:09:02 -0800
  • 8224195432 bugfix. Nathan TeBlunthuis 2024-12-05 11:08:18 -0800
  • 5d70d3eb6d improve spark configuration. Nathan TeBlunthuis 2024-12-04 10:43:13 -0800
  • 89d03dd956 consistent naming and bugfix. Nathan TeBlunthuis 2024-12-04 09:24:45 -0800
  • 472849ebd9 correct output path. Nathan TeBlunthuis 2024-12-04 09:07:10 -0800
  • 85945eae90 correct paths. Nathan TeBlunthuis 2024-12-04 09:06:02 -0800
  • 1cca01fb69 use Path to make directories not os. Nathan TeBlunthuis 2024-12-04 07:47:47 -0800
  • 39c0fa7a29 bugfix. Nathan TeBlunthuis 2024-12-03 19:18:38 -0800
  • 0436450ea8 typo fix Nathan TeBlunthuis 2024-12-03 19:16:49 -0800
  • 4be8bb6bf5 bugfix Nathan TeBlunthuis 2024-12-03 19:15:07 -0800
  • ec5859c311 pass ngram_output through. Nathan TeBlunthuis 2024-12-03 19:05:44 -0800
  • a179d608eb bugfix. Nathan TeBlunthuis 2024-12-03 19:02:26 -0800
  • 73dd2a96a6 it's selftext not body Nathan TeBlunthuis 2024-12-03 18:59:27 -0800
  • 5045d6052e use post title and body in terms Nathan TeBlunthuis 2024-12-03 18:53:41 -0800
  • 51234f1070 add inpath param for tfidf_authors_weekly. Nathan TeBlunthuis 2024-12-03 10:16:23 -0800
  • 0a6ad65baf add shebang Nathan TeBlunthuis 2024-12-03 09:06:40 -0800
  • 7096785cb9 make exe Nathan TeBlunthuis 2024-12-03 09:05:44 -0800
  • 355d014d5f pass path into tfidf function. Nathan TeBlunthuis 2024-12-02 08:03:19 -0800
  • 5a131053af spark config tweaks. Nathan TeBlunthuis 2024-12-01 15:41:47 -0800
  • 224fb89317 bugfix. Nathan TeBlunthuis 2024-12-01 15:28:25 -0800
  • b25c332cea typo fix. Nathan TeBlunthuis 2024-12-01 15:27:16 -0800
  • 613059737a set os environment for big machine Nathan TeBlunthuis 2024-12-01 15:25:18 -0800
  • abe217d2d5 fix configuration code Nathan TeBlunthuis 2024-12-01 15:21:51 -0800
  • 9911f758f9 set memory usage. Nathan TeBlunthuis 2024-12-01 14:55:38 -0800
  • a31d8b26eb correct tf_name Nathan TeBlunthuis 2024-12-01 14:38:48 -0800
  • e40cc45d40 bugfix. Nathan TeBlunthuis 2024-12-01 14:10:47 -0800
  • d61746c9f7 make the output authors path. Nathan TeBlunthuis 2024-12-01 13:58:13 -0800
  • 9df9a8b8ff rename function. Nathan TeBlunthuis 2024-12-01 13:44:19 -0800
  • 3fea1f9388 sort and partition the term frequencies using spark. Nathan TeBlunthuis 2024-12-01 13:42:13 -0800
  • 2b023fea8d bugfix Nathan TeBlunthuis 2024-12-01 09:58:09 -0800
  • 88fca0f82b allow posts schemas to be nullable. Nathan TeBlunthuis 2024-12-01 09:55:12 -0800
  • 271cbea7d9 add a 'limit' parameter for testing. Nathan TeBlunthuis 2024-12-01 09:51:49 -0800
  • 4218bf864b debugging. Nathan TeBlunthuis 2024-12-01 09:39:50 -0800
  • 22d6a6961c allow authors to be null in submissions. Nathan TeBlunthuis 2024-11-27 20:04:05 -0800
  • a5ca25dd6e bugfix. Nathan TeBlunthuis 2024-11-27 19:56:06 -0800
  • 2e5181602b bugfix. Nathan TeBlunthuis 2024-11-27 19:53:04 -0800