|
53f5b8c03c
|
add note to try other tf normalization strategies.
|
2022-03-31 12:17:16 -07:00 |
|
|
14ab979f59
|
Merge branch 'master' of code:cdsc_reddit
|
2021-08-03 15:03:40 -07:00 |
|
Nate E TeBlunthuis
|
c6122bb429
|
Merge branch 'master' of code:cdsc_reddit
|
2021-07-28 15:32:21 -07:00 |
|
Nate E TeBlunthuis
|
596e1ff339
|
no longer do we need to get daily dumps
|
2021-07-28 15:32:04 -07:00 |
|
Nate E TeBlunthuis
|
6a3bfa26ee
|
bugfix
|
2021-04-26 22:31:05 -07:00 |
|
Nate E TeBlunthuis
|
3a758f1fc8
|
Merge branch 'charliepatch' of code:cdsc_reddit into charliepatch
|
2021-04-26 13:58:25 -07:00 |
|
Nate E TeBlunthuis
|
806cfc948f
|
support passing in list of tfidf vectors.
Also lowercases included subreddits.
|
2021-04-26 13:20:43 -07:00 |
|
Nate E TeBlunthuis
|
0fe120e4ab
|
support passing in list of tfidf vectors.
Also lowercases included subreddits.
|
2021-04-26 11:44:56 -07:00 |
|
Nate E TeBlunthuis
|
f20365c07e
|
Merge branch 'master' of code:cdsc_reddit
|
2021-04-22 10:46:26 -07:00 |
|
Nate E TeBlunthuis
|
34e0a0a30d
|
version of weekly_cosine_similarities.py from klone
|
2021-04-22 10:38:10 -07:00 |
|
Nate E TeBlunthuis
|
003a48aea5
|
bugfix in weekly similarities
|
2021-04-22 10:37:04 -07:00 |
|
Nate E TeBlunthuis
|
37dd0ef55f
|
bugfixes in clustering selection.
|
2021-04-21 16:56:25 -07:00 |
|
Nate E TeBlunthuis
|
ac06a8757a
|
calculate some user-level attributes to detect bots
|
2021-04-20 11:34:36 -07:00 |
|
Nate E TeBlunthuis
|
01a4c35358
|
grid sweep selection for clustering hyperparameters
|
2021-04-20 11:33:54 -07:00 |
|
Nate E TeBlunthuis
|
628a70734b
|
Merge branch 'master' of code:cdsc_reddit
|
2021-04-05 23:21:35 -07:00 |
|
Nate E TeBlunthuis
|
f0176d9f0d
|
Changes for cosine similarities on klone.
|
2021-04-05 23:21:06 -07:00 |
|
Nate E TeBlunthuis
|
36cb0a5546
|
add code for pulling activity time series from parquet.
|
2021-03-24 16:08:57 -07:00 |
|
Nate E TeBlunthuis
|
06430903f0
|
add included_subreddits parameter to cosine similarities.
|
2021-02-22 18:38:34 -08:00 |
|
Nate E TeBlunthuis
|
4dc949de5f
|
Changes from hyak.
|
2021-02-22 16:03:48 -08:00 |
|
Nate E TeBlunthuis
|
140d1bdd17
|
fix bug in viz.
|
2021-01-27 20:26:15 -08:00 |
|
Nate E TeBlunthuis
|
554660275f
|
add visualization for 10000 subreddits based on author-tf similarities.
|
2021-01-27 20:22:24 -08:00 |
|
Nate E TeBlunthuis
|
b4dd9acbd8
|
Merge branch 'master' of code:cdsc_reddit
|
2021-01-27 20:09:23 -08:00 |
|
|
dbe4c87f8b
|
add cluster selection to visualization
|
2021-01-27 20:08:07 -08:00 |
|
Nate E TeBlunthuis
|
3155600514
|
remove nsfw subs from topN
|
2020-12-28 21:11:44 -08:00 |
|
Nate E TeBlunthuis
|
4e20dce188
|
Updating to support wang-style user overlaps.
|
2020-12-24 22:38:04 -08:00 |
|
Nate E TeBlunthuis
|
56269deee3
|
Some improvements to run affinity clustering on larger dataset and
compute density.
|
2020-12-12 20:42:47 -08:00 |
|
Nate E TeBlunthuis
|
e6294b5b90
|
Refactor and reorganze.
|
2020-12-08 17:32:20 -08:00 |
|
Nate E TeBlunthuis
|
a60747292e
|
Add code for running tf-idf at the weekly level.
|
2020-12-01 22:54:48 -08:00 |
|
|
db5879d6c9
|
refactor visualization code.
|
2020-11-17 16:46:49 -08:00 |
|
|
13eb95b3b0
|
Merge remote-tracking branch 'refs/remotes/origin/master' into master
|
2020-11-17 16:33:14 -08:00 |
|
|
2cc897543a
|
git-annex in nathante@nate-x1:~/cdsc_reddit
|
2020-11-17 16:33:13 -08:00 |
|
Nate E TeBlunthuis
|
1bf206d219
|
git-annex in nathante@mox2.hyak.local:/gscratch/comdata/users/nathante/cdsc-reddit
|
2020-11-17 16:31:48 -08:00 |
|
Nate E TeBlunthuis
|
f8ff8b2d0f
|
Update code for clustering + tsne.
|
2020-11-17 15:59:20 -08:00 |
|
Nate E TeBlunthuis
|
82d184d9c6
|
Update code for building simlarity matrices.
|
2020-11-17 12:52:48 -08:00 |
|
Nate E TeBlunthuis
|
e794214653
|
bugfix in completing tfidf similarity matrices.
|
2020-11-12 11:47:53 -08:00 |
|
Nate E TeBlunthuis
|
220a540beb
|
increase learning rate.
|
2020-11-11 16:58:39 -08:00 |
|
Nate E TeBlunthuis
|
cd43a94865
|
increase iterations and perplectity and early_exaggeration
|
2020-11-11 16:55:39 -08:00 |
|
Nate E TeBlunthuis
|
ca6a8f0896
|
increase learning rate
|
2020-11-11 16:48:41 -08:00 |
|
Nate E TeBlunthuis
|
ed0e1a8235
|
Fix bug in tsne.
|
2020-11-11 16:43:41 -08:00 |
|
Nate E TeBlunthuis
|
6baa08889b
|
git-annex in nathante@mox2.hyak.local:/gscratch/comdata/users/nathante/cdsc-reddit
|
2020-11-11 16:39:44 -08:00 |
|
Nate E TeBlunthuis
|
4447c60265
|
split fitting and plotting tsne.
|
2020-11-11 16:38:22 -08:00 |
|
|
db53c0138a
|
Add file to plot related subreddits using tsne.
|
2020-11-11 16:05:36 -08:00 |
|
Nate E TeBlunthuis
|
4c8bd14992
|
Bugfix (typo)
|
2020-11-10 13:38:11 -08:00 |
|
Nate E TeBlunthuis
|
39c581bee9
|
Reuse code for term and author cosine similarity.
|
2020-11-10 13:18:57 -08:00 |
|
Nate E TeBlunthuis
|
5632a971c6
|
Refactor tfidf code to for code resuse.
|
2020-11-10 13:18:19 -08:00 |
|
Nate E TeBlunthuis
|
772f3a8fbd
|
rename 'idf' files to 'tfidf'
|
2020-11-10 13:16:55 -08:00 |
|
Nate E TeBlunthuis
|
6edd155749
|
Improvements to idf code
|
2020-11-10 13:12:11 -08:00 |
|
Nate E TeBlunthuis
|
8b8c45ee2d
|
Merge branch 'master' of code:cdsc_reddit
|
2020-11-02 10:40:12 -08:00 |
|
Nate E TeBlunthuis
|
3dc17bd27c
|
add term_cosine_similarity.py
|
2020-11-02 10:40:02 -08:00 |
|
|
0882878166
|
Add Cosine similarities to README.md
|
2020-11-02 09:48:10 -08:00 |
|