1
0

Commit Graph

  • 9345f9de94 make pass keyword arg to dataframe.drop icwsm_dataverse Nathan TeBlunthuis 2023-05-31 09:47:21 -07:00
  • 07b0dff9bc changes for archiving. Nathan TeBlunthuis 2023-05-23 17:18:19 -07:00
  • 811a0d87c4 changes from dirty branch. Nathan TeBlunthuis 2023-05-18 10:29:08 -07:00
  • c190791364 add 2 more umap parameters excise_reindex Nathan TeBlunthuis 2022-06-08 17:27:37 -07:00
  • 5a40465a62 add support for umap->hdbscan clustering method Nathan TeBlunthuis 2022-06-08 17:01:27 -07:00
  • 55b75ea6fc Merge remote-tracking branch 'refs/remotes/origin/excise_reindex' into excise_reindex synced/excise_reindex Nathan TeBlunthuis 2022-04-06 11:14:13 -07:00
  • d6d4f1376b update synced/git-annex git-annex Nathan TeBlunthuis 2022-04-06 11:11:13 -07:00
  • 197518a222 git-annex in Nathan TeBlunthuis 2022-04-06 11:11:11 -07:00
  • 1e26083dcc merging origin/git-annex origin/synced/git-annex into git-annex Nathan TeBlunthuis 2022-04-06 11:08:45 -07:00
  • c8f63649a3 branch created Nathan TeBlunthuis 2022-04-06 11:08:45 -07:00
  • 53f5b8c03c add note to try other tf normalization strategies. master Nathan TeBlunthuis 2022-03-31 12:17:16 -07:00
  • 930ee47d2b refactor similarities to use submodule. factor_out_similarities Nathan TeBlunthuis 2022-01-19 15:05:49 -08:00
  • 65deba5e4e Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2022-01-19 14:01:44 -08:00
  • 7b130a30af commit changes from smap project. Nathan TeBlunthuis 2022-01-19 13:57:02 -08:00
  • 98c1317af5 update pushshift dumps. Nathan TeBlunthuis 2021-12-10 21:23:32 -08:00
  • 541e125b28 lsi support for weekly similarities Nathan TeBlunthuis 2021-08-11 22:48:33 -07:00
  • b7c39a3494 Merge branch 'master' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2021-08-03 15:13:39 -07:00
  • ce549c6c97 Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2021-08-03 15:13:21 -07:00
  • 6e43294a41 Updates to similarities code for smap project. Nathan TeBlunthuis 2021-08-03 15:06:48 -07:00
  • 14ab979f59 Merge branch 'master' of code:cdsc_reddit Nathan TeBlunthuis 2021-08-03 15:03:40 -07:00
  • 2d21ff1137 Merge branch 'master' of code:cdsc_reddit into excise_reindex Nathan TeBlunthuis 2021-08-03 15:02:08 -07:00
  • cf86c7492c update clustering scripts Nate E TeBlunthuis 2021-08-03 14:55:02 -07:00
  • c6122bb429 Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-07-28 15:32:21 -07:00
  • 596e1ff339 no longer do we need to get daily dumps Nate E TeBlunthuis 2021-07-28 15:32:04 -07:00
  • 87ffaa6858 script for picking the best clustering given constraints Nate E TeBlunthuis 2021-05-14 19:10:36 -07:00
  • 7b14db67de Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex Nate E TeBlunthuis 2021-05-13 22:28:31 -07:00
  • 0b95bea30e support isolates in visualization Nate E TeBlunthuis 2021-05-13 22:26:58 -07:00
  • 582cf263ea bug fix in affinity clustering Nate E TeBlunthuis 2021-05-13 22:26:03 -07:00
  • 8a2248fae1 Merge remote-tracking branch 'origin/excise_reindex' into temp Nate E TeBlunthuis 2021-05-10 18:32:03 -07:00
  • 47ba04aa97 add script for pulling cluster timeseries Nate E TeBlunthuis 2021-05-10 18:24:22 -07:00
  • 4cb7eeec80 Refactor to make a decent api. Nate E TeBlunthuis 2021-05-10 13:46:49 -07:00
  • f05cb962e0 refactor clustring in object oriented style Nate E TeBlunthuis 2021-05-07 22:33:26 -07:00
  • 8d1df5b26e refactor clustering.py into method-specific files. Nate E TeBlunthuis 2021-05-03 11:28:48 -07:00
  • e1c9d9af6f Remove 'exclude phrases' parameter. Nate E TeBlunthuis 2021-05-03 10:37:09 -07:00
  • f728292461 Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch origin/charliepatch charliepatch Nate E TeBlunthuis 2021-05-02 23:56:16 -07:00
  • 95905cfc8b Merge branch 'excise_reindex' of code:cdsc_reddit into charliepatch Nate E TeBlunthuis 2021-05-02 23:52:52 -07:00
  • 7df8436067 Use Latent semantic indexing and hdbscan Nate E TeBlunthuis 2021-05-02 23:39:55 -07:00
  • 36b24ee933 reindex tfidf in memory instead of using spark Nate E TeBlunthuis 2021-04-30 12:48:19 -07:00
  • 6a3bfa26ee bugfix Nate E TeBlunthuis 2021-04-26 22:31:05 -07:00
  • 3a758f1fc8 Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch Nate E TeBlunthuis 2021-04-26 13:22:29 -07:00
  • 46623927fe Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch Nate E TeBlunthuis 2021-04-26 13:22:29 -07:00
  • 806cfc948f support passing in list of tfidf vectors. Nate E TeBlunthuis 2021-04-26 11:16:28 -07:00
  • 0fe120e4ab support passing in list of tfidf vectors. Nate E TeBlunthuis 2021-04-26 11:16:28 -07:00
  • f20365c07e Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-04-22 10:46:26 -07:00
  • 34e0a0a30d version of weekly_cosine_similarities.py from klone Nate E TeBlunthuis 2021-04-22 10:38:10 -07:00
  • 003a48aea5 bugfix in weekly similarities Nate E TeBlunthuis 2021-04-22 10:37:04 -07:00
  • 37dd0ef55f bugfixes in clustering selection. Nate E TeBlunthuis 2021-04-21 16:56:25 -07:00
  • ac06a8757a calculate some user-level attributes to detect bots Nate E TeBlunthuis 2021-04-20 11:34:36 -07:00
  • 01a4c35358 grid sweep selection for clustering hyperparameters Nate E TeBlunthuis 2021-04-20 11:33:54 -07:00
  • 628a70734b Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-04-05 23:21:35 -07:00
  • f0176d9f0d Changes for cosine similarities on klone. Nate E TeBlunthuis 2021-04-05 23:21:06 -07:00
  • a013f6718b export timeseries functions Nate E TeBlunthuis 2021-03-24 17:18:30 -07:00
  • 36cb0a5546 add code for pulling activity time series from parquet. Nate E TeBlunthuis 2021-03-24 16:08:57 -07:00
  • 06430903f0 add included_subreddits parameter to cosine similarities. Nate E TeBlunthuis 2021-02-22 18:38:34 -08:00
  • 4dc949de5f Changes from hyak. Nate E TeBlunthuis 2021-02-22 16:03:48 -08:00
  • 140d1bdd17 fix bug in viz. Nate E TeBlunthuis 2021-01-27 20:26:15 -08:00
  • 554660275f add visualization for 10000 subreddits based on author-tf similarities. Nate E TeBlunthuis 2021-01-27 20:22:24 -08:00
  • b4dd9acbd8 Merge branch 'master' of code:cdsc_reddit Nate E TeBlunthuis 2021-01-27 20:09:23 -08:00
  • dbe4c87f8b add cluster selection to visualization Nathan TeBlunthuis 2021-01-27 20:08:07 -08:00
  • 3155600514 remove nsfw subs from topN Nate E TeBlunthuis 2020-12-28 21:11:44 -08:00
  • 4e20dce188 Updating to support wang-style user overlaps. Nate E TeBlunthuis 2020-12-24 22:38:04 -08:00
  • 56269deee3 Some improvements to run affinity clustering on larger dataset and compute density. Nate E TeBlunthuis 2020-12-12 20:42:47 -08:00
  • e6294b5b90 Refactor and reorganze. Nate E TeBlunthuis 2020-12-08 17:32:20 -08:00
  • a60747292e Add code for running tf-idf at the weekly level. Nate E TeBlunthuis 2020-12-01 22:54:48 -08:00
  • db5879d6c9 refactor visualization code. Nathan TeBlunthuis 2020-11-17 16:46:49 -08:00
  • c930c6c8ef update git repository hosting 2020-11-17 16:33:16 -08:00
  • 5f7fb2bca6 update Nathan TeBlunthuis 2020-11-17 16:33:15 -08:00
  • 13eb95b3b0 Merge remote-tracking branch 'refs/remotes/origin/master' into master synced/master Nathan TeBlunthuis 2020-11-17 16:33:14 -08:00
  • 52d7540e65 merging origin/git-annex into git-annex Nathan TeBlunthuis 2020-11-17 16:33:14 -08:00
  • 2cc897543a git-annex in nathante@nate-x1:~/cdsc_reddit Nathan TeBlunthuis 2020-11-17 16:33:13 -08:00
  • 4c081b8166 merging code/git-annex into git-annex Nate E TeBlunthuis 2020-11-17 16:32:27 -08:00
  • 37c9af47ea update git repository hosting 2020-11-17 16:32:13 -08:00
  • 77c177a31a update Nate E TeBlunthuis 2020-11-17 16:32:12 -08:00
  • 1bf206d219 git-annex in nathante@mox2.hyak.local:/gscratch/comdata/users/nathante/cdsc-reddit Nate E TeBlunthuis 2020-11-17 16:31:48 -08:00
  • 1e294cf8ff update Nate E TeBlunthuis 2020-11-17 16:31:37 -08:00
  • fa40a0ea99 update Nate E TeBlunthuis 2020-11-17 16:16:31 -08:00
  • 552dab0637 update Nate E TeBlunthuis 2020-11-17 16:08:56 -08:00
  • d4b99344d8 update Nate E TeBlunthuis 2020-11-17 16:08:46 -08:00
  • f8ff8b2d0f Update code for clustering + tsne. Nate E TeBlunthuis 2020-11-17 15:59:20 -08:00
  • 82d184d9c6 Update code for building simlarity matrices. Nate E TeBlunthuis 2020-11-17 12:52:48 -08:00
  • 07a14caf01 merging code/git-annex into git-annex Nate E TeBlunthuis 2020-11-16 13:46:36 -08:00
  • 1127bf4c9d update Nathan TeBlunthuis 2020-11-12 21:15:17 -08:00
  • 4da12c4d3a update Nathan TeBlunthuis 2020-11-12 21:04:38 -08:00
  • 7d014ca95d update Nathan TeBlunthuis 2020-11-12 16:39:48 -08:00
  • e794214653 bugfix in completing tfidf similarity matrices. Nate E TeBlunthuis 2020-11-12 11:47:53 -08:00
  • ce3cbb65e4 update Nate E TeBlunthuis 2020-11-12 11:47:34 -08:00
  • 8268f8424b update Nathan TeBlunthuis 2020-11-11 17:15:25 -08:00
  • 4b0b898d4d update Nathan TeBlunthuis 2020-11-11 17:11:28 -08:00
  • a824d312d2 update Nathan TeBlunthuis 2020-11-11 17:08:40 -08:00
  • 8929699397 update Nathan TeBlunthuis 2020-11-11 16:59:12 -08:00
  • b838900de8 update git repository hosting 2020-11-11 16:59:04 -08:00
  • b2d3bb0129 update Nate E TeBlunthuis 2020-11-11 16:59:03 -08:00
  • 9d3d24c33a merging code/git-annex into git-annex Nate E TeBlunthuis 2020-11-11 16:59:03 -08:00
  • 220a540beb increase learning rate. Nate E TeBlunthuis 2020-11-11 16:58:39 -08:00
  • 566a031593 update Nate E TeBlunthuis 2020-11-11 16:58:32 -08:00
  • cd43a94865 increase iterations and perplectity and early_exaggeration Nate E TeBlunthuis 2020-11-11 16:55:39 -08:00
  • 06b9dbe9c2 update Nate E TeBlunthuis 2020-11-11 16:55:31 -08:00
  • 1087418928 update Nathan TeBlunthuis 2020-11-11 16:49:11 -08:00
  • 9451b7018d update git repository hosting 2020-11-11 16:49:01 -08:00
  • 17fdf46fda update Nate E TeBlunthuis 2020-11-11 16:48:59 -08:00