1
0

Commit Graph

  • 0d7f4d3cec pass through stopWords. Nathan TeBlunthuis 2024-11-27 19:33:28 -0800
  • 5d48c0eb55 pass through mwe_tokenize Nathan TeBlunthuis 2024-11-27 19:31:59 -0800
  • 91cc1edf02 pass through mwe_pass Nathan TeBlunthuis 2024-11-27 19:20:49 -0800
  • 2decdc9750 move function to outer scope. Nathan TeBlunthuis 2024-11-27 19:13:49 -0800
  • 7da046735b move function to outer scope. Nathan TeBlunthuis 2024-11-27 19:10:34 -0800
  • 0631256956 make the output directory. Nathan TeBlunthuis 2024-11-27 19:06:24 -0800
  • 8cb9683bc2 bugfix Nathan TeBlunthuis 2024-11-27 19:03:52 -0800
  • 587e1c0022 bugfix. Nathan TeBlunthuis 2024-11-27 18:56:22 -0800
  • 78eb16f4d6 more path munging. Nathan TeBlunthuis 2024-11-27 18:53:16 -0800
  • a0a6a08bf2 handle case where we're in a parent directory. Nathan TeBlunthuis 2024-11-27 18:49:03 -0800
  • a84b633641 add absolute path to call. Nathan TeBlunthuis 2024-11-27 18:42:29 -0800
  • ce7b5f92eb bugfix. Nathan TeBlunthuis 2024-11-27 17:20:04 -0800
  • fbf905c740 rename file Nathan TeBlunthuis 2024-11-27 11:55:31 -0800
  • dd894ebf61 support posts in ngrams Nathan TeBlunthuis 2024-11-27 11:51:22 -0800
  • 9345f9de94 make pass keyword arg to dataframe.drop icwsm_dataverse Nathan TeBlunthuis 2023-05-31 09:47:21 -0700
  • 07b0dff9bc changes for archiving. Nathan TeBlunthuis 2023-05-23 17:18:19 -0700
  • 811a0d87c4 changes from dirty branch. Nathan TeBlunthuis 2023-05-18 10:29:08 -0700
  • c190791364 add 2 more umap parameters excise_reindex Nathan TeBlunthuis 2022-06-08 17:27:37 -0700
  • 5a40465a62 add support for umap->hdbscan clustering method Nathan TeBlunthuis 2022-06-08 17:01:27 -0700
  • 55b75ea6fc Merge remote-tracking branch 'refs/remotes/origin/excise_reindex' into excise_reindex synced/excise_reindex Nathan TeBlunthuis 2022-04-06 11:14:13 -0700
  • d6d4f1376b update synced/git-annex git-annex Nathan TeBlunthuis 2022-04-06 11:11:13 -0700
  • 197518a222 git-annex in Nathan TeBlunthuis 2022-04-06 11:11:11 -0700
  • 1e26083dcc merging origin/git-annex origin/synced/git-annex into git-annex Nathan TeBlunthuis 2022-04-06 11:08:45 -0700
  • c8f63649a3 branch created Nathan TeBlunthuis 2022-04-06 11:08:45 -0700
  • 53f5b8c03c add note to try other tf normalization strategies. Nathan TeBlunthuis 2022-03-31 12:17:16 -0700
  • 930ee47d2b refactor similarities to use submodule. factor_out_similarities Nathan TeBlunthuis 2022-01-19 15:05:49 -0800
  • 65deba5e4e Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2022-01-19 14:01:44 -0800
  • 7b130a30af commit changes from smap project. Nathan TeBlunthuis 2022-01-19 13:57:02 -0800
  • 98c1317af5 update pushshift dumps. Nathan TeBlunthuis 2021-12-10 21:23:32 -0800
  • 541e125b28 lsi support for weekly similarities Nathan TeBlunthuis 2021-08-11 22:48:33 -0700
  • b7c39a3494 Merge branch 'master' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2021-08-03 15:13:39 -0700
  • ce549c6c97 Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2021-08-03 15:13:21 -0700
  • 6e43294a41 Updates to similarities code for smap project. Nathan TeBlunthuis 2021-08-03 15:06:48 -0700
  • 14ab979f59 Merge branch 'master' of code:cdsc_reddit Nathan TeBlunthuis 2021-08-03 15:03:40 -0700
  • 2d21ff1137 Merge branch 'master' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2021-08-03 15:02:08 -0700
  • cf86c7492c update clustering scripts Nate E TeBlunthuis 2021-08-03 14:55:02 -0700
  • c6122bb429 Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-07-28 15:32:21 -0700
  • 596e1ff339 no longer do we need to get daily dumps Nate E TeBlunthuis 2021-07-28 15:32:04 -0700
  • 87ffaa6858 script for picking the best clustering given constraints Nate E TeBlunthuis 2021-05-14 19:10:36 -0700
  • 7b14db67de Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex Nate E TeBlunthuis 2021-05-13 22:28:31 -0700
  • 0b95bea30e support isolates in visualization Nate E TeBlunthuis 2021-05-13 22:26:58 -0700
  • 582cf263ea bug fix in affinity clustering Nate E TeBlunthuis 2021-05-13 22:26:03 -0700
  • 8a2248fae1 Merge remote-tracking branch 'origin/excise_reindex' into temp Nate E TeBlunthuis 2021-05-10 18:32:03 -0700
  • 47ba04aa97 add script for pulling cluster timeseries Nate E TeBlunthuis 2021-05-10 18:24:22 -0700
  • 4cb7eeec80 Refactor to make a decent api. Nate E TeBlunthuis 2021-05-10 13:46:49 -0700
  • f05cb962e0 refactor clustring in object oriented style Nate E TeBlunthuis 2021-05-07 22:33:26 -0700
  • 8d1df5b26e refactor clustering.py into method-specific files. Nate E TeBlunthuis 2021-05-03 11:28:48 -0700
  • e1c9d9af6f Remove 'exclude phrases' parameter. Nate E TeBlunthuis 2021-05-03 10:37:09 -0700
  • f728292461 Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch charliepatch Nate E TeBlunthuis 2021-05-02 23:56:16 -0700
  • 95905cfc8b Merge branch 'excise_reindex' of code:cdsc_reddit into charliepatch Nate E TeBlunthuis 2021-05-02 23:52:52 -0700
  • 7df8436067 Use Latent semantic indexing and hdbscan Nate E TeBlunthuis 2021-05-02 23:39:55 -0700
  • 36b24ee933 reindex tfidf in memory instead of using spark Nate E TeBlunthuis 2021-04-30 12:48:19 -0700
  • 6a3bfa26ee bugfix Nate E TeBlunthuis 2021-04-26 22:31:05 -0700
  • 3a758f1fc8 Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch Nate E TeBlunthuis 2021-04-26 13:22:29 -0700
  • 46623927fe Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch Nate E TeBlunthuis 2021-04-26 13:22:29 -0700
  • 806cfc948f support passing in list of tfidf vectors. Nate E TeBlunthuis 2021-04-26 11:16:28 -0700
  • 0fe120e4ab support passing in list of tfidf vectors. Nate E TeBlunthuis 2021-04-26 11:16:28 -0700
  • f20365c07e Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-04-22 10:46:26 -0700
  • 34e0a0a30d version of weekly_cosine_similarities.py from klone Nate E TeBlunthuis 2021-04-22 10:38:10 -0700
  • 003a48aea5 bugfix in weekly similarities Nate E TeBlunthuis 2021-04-22 10:37:04 -0700
  • 37dd0ef55f bugfixes in clustering selection. Nate E TeBlunthuis 2021-04-21 16:56:25 -0700
  • ac06a8757a calculate some user-level attributes to detect bots Nate E TeBlunthuis 2021-04-20 11:34:36 -0700
  • 01a4c35358 grid sweep selection for clustering hyperparameters Nate E TeBlunthuis 2021-04-20 11:33:54 -0700
  • 628a70734b Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-04-05 23:21:35 -0700
  • f0176d9f0d Changes for cosine similarities on klone. Nate E TeBlunthuis 2021-04-05 23:21:06 -0700
  • a013f6718b export timeseries functions Nate E TeBlunthuis 2021-03-24 17:18:30 -0700
  • 36cb0a5546 add code for pulling activity time series from parquet. Nate E TeBlunthuis 2021-03-24 16:08:57 -0700
  • 06430903f0 add included_subreddits parameter to cosine similarities. Nate E TeBlunthuis 2021-02-22 18:38:34 -0800
  • 4dc949de5f Changes from hyak. Nate E TeBlunthuis 2021-02-22 16:03:48 -0800
  • 140d1bdd17 fix bug in viz. Nate E TeBlunthuis 2021-01-27 20:26:15 -0800
  • 554660275f add visualization for 10000 subreddits based on author-tf similarities. Nate E TeBlunthuis 2021-01-27 20:22:24 -0800
  • b4dd9acbd8 Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-01-27 20:09:23 -0800
  • dbe4c87f8b add cluster selection to visualization Nathan TeBlunthuis 2021-01-27 20:08:07 -0800
  • 3155600514 remove nsfw subs from topN Nate E TeBlunthuis 2020-12-28 21:11:44 -0800
  • 4e20dce188 Updating to support wang-style user overlaps. Nate E TeBlunthuis 2020-12-24 22:38:04 -0800
  • 56269deee3 Some improvements to run affinity clustering on larger dataset and compute density. Nate E TeBlunthuis 2020-12-12 20:42:47 -0800
  • e6294b5b90 Refactor and reorganze. Nate E TeBlunthuis 2020-12-08 17:32:20 -0800
  • a60747292e Add code for running tf-idf at the weekly level. Nate E TeBlunthuis 2020-12-01 22:54:48 -0800
  • db5879d6c9 refactor visualization code. Nathan TeBlunthuis 2020-11-17 16:46:49 -0800
  • c930c6c8ef update git repository hosting 2020-11-17 16:33:16 -0800
  • 5f7fb2bca6 update Nathan TeBlunthuis 2020-11-17 16:33:15 -0800
  • 13eb95b3b0 Merge remote-tracking branch 'refs/remotes/origin/master' into master synced/master Nathan TeBlunthuis 2020-11-17 16:33:14 -0800
  • 52d7540e65 merging origin/git-annex into git-annex Nathan TeBlunthuis 2020-11-17 16:33:14 -0800
  • 2cc897543a git-annex in nathante@nate-x1:~/cdsc_reddit Nathan TeBlunthuis 2020-11-17 16:33:13 -0800
  • 4c081b8166 merging code/git-annex into git-annex Nate E TeBlunthuis 2020-11-17 16:32:27 -0800
  • 37c9af47ea update git repository hosting 2020-11-17 16:32:13 -0800
  • 77c177a31a update Nate E TeBlunthuis 2020-11-17 16:32:12 -0800
  • 1bf206d219 git-annex in nathante@mox2.hyak.local:/gscratch/comdata/users/nathante/cdsc-reddit Nate E TeBlunthuis 2020-11-17 16:31:48 -0800
  • 1e294cf8ff update Nate E TeBlunthuis 2020-11-17 16:31:37 -0800
  • fa40a0ea99 update Nate E TeBlunthuis 2020-11-17 16:16:31 -0800
  • 552dab0637 update Nate E TeBlunthuis 2020-11-17 16:08:56 -0800
  • d4b99344d8 update Nate E TeBlunthuis 2020-11-17 16:08:46 -0800
  • f8ff8b2d0f Update code for clustering + tsne. Nate E TeBlunthuis 2020-11-17 15:59:20 -0800
  • 82d184d9c6 Update code for building simlarity matrices. Nate E TeBlunthuis 2020-11-17 12:52:48 -0800
  • 07a14caf01 merging code/git-annex into git-annex Nate E TeBlunthuis 2020-11-16 13:46:36 -0800
  • 1127bf4c9d update Nathan TeBlunthuis 2020-11-12 21:15:17 -0800
  • 4da12c4d3a update Nathan TeBlunthuis 2020-11-12 21:04:38 -0800
  • 7d014ca95d update Nathan TeBlunthuis 2020-11-12 16:39:48 -0800
  • e794214653 bugfix in completing tfidf similarity matrices. Nate E TeBlunthuis 2020-11-12 11:47:53 -0800
  • ce3cbb65e4 update Nate E TeBlunthuis 2020-11-12 11:47:34 -0800