Nathan TeBlunthuis
|
02ec11f726
|
no longer need to convert from spark dates into isoformat.
|
2024-12-28 13:55:54 -08:00 |
|
Nathan TeBlunthuis
|
104b708ff6
|
use duckdb not spark to prepare for weekly similarities.
|
2024-12-28 13:45:17 -08:00 |
|
Nathan TeBlunthuis
|
74ee86e443
|
add weekly_cosine_similarities script.
|
2024-12-25 21:15:38 -08:00 |
|
Nathan TeBlunthuis
|
a8a92d30df
|
bugfix
|
2024-12-19 23:34:55 -08:00 |
|
Nathan TeBlunthuis
|
638ab78375
|
comment out config.
|
2024-12-19 23:32:16 -08:00 |
|
Nathan TeBlunthuis
|
8cb75c8354
|
typo fix.
|
2024-12-19 20:10:34 -08:00 |
|
Nathan TeBlunthuis
|
0bbdc6bd5e
|
typo fix.
|
2024-12-19 20:09:00 -08:00 |
|
Nathan TeBlunthuis
|
8b69801c8d
|
correct number of partitions.
|
2024-12-19 19:39:18 -08:00 |
|
Nathan TeBlunthuis
|
189330198c
|
repartition for parallelism.
|
2024-12-19 17:53:27 -08:00 |
|
Nathan TeBlunthuis
|
c6c9ec173b
|
add shebang
|
2024-12-15 18:47:07 -08:00 |
|
Nathan TeBlunthuis
|
52694e0498
|
typofix
|
2024-12-15 08:23:06 -08:00 |
|
Nathan TeBlunthuis
|
cb2f2c9717
|
make executable.
|
2024-12-15 08:18:42 -08:00 |
|
Nathan TeBlunthuis
|
3d192ab82f
|
Merge remote-tracking branch 'origin/icwsm_dataverse'
|
2024-12-12 07:45:06 -08:00 |
|
Nathan TeBlunthuis
|
e2b6c1b481
|
configure to use the g2-cpu node.
|
2024-12-12 07:17:10 -08:00 |
|
Nathan TeBlunthuis
|
51234f1070
|
add inpath param for tfidf_authors_weekly.
|
2024-12-03 10:16:23 -08:00 |
|
Nathan TeBlunthuis
|
0a6ad65baf
|
add shebang
|
2024-12-03 09:06:40 -08:00 |
|
Nathan TeBlunthuis
|
7096785cb9
|
make exe
|
2024-12-03 09:05:44 -08:00 |
|
Nathan TeBlunthuis
|
355d014d5f
|
pass path into tfidf function.
|
2024-12-02 08:03:19 -08:00 |
|
|
07b0dff9bc
|
changes for archiving.
|
2023-05-23 17:18:19 -07:00 |
|
|
55b75ea6fc
|
Merge remote-tracking branch 'refs/remotes/origin/excise_reindex' into excise_reindex
|
2022-04-06 11:14:13 -07:00 |
|
|
197518a222
|
git-annex in
|
2022-04-06 11:11:11 -07:00 |
|
|
53f5b8c03c
|
add note to try other tf normalization strategies.
|
2022-03-31 12:17:16 -07:00 |
|
|
7b130a30af
|
commit changes from smap project.
|
2022-01-19 13:57:02 -08:00 |
|
|
541e125b28
|
lsi support for weekly similarities
|
2021-08-11 22:48:33 -07:00 |
|
|
ce549c6c97
|
Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex
|
2021-08-03 15:13:21 -07:00 |
|
|
6e43294a41
|
Updates to similarities code for smap project.
|
2021-08-03 15:06:48 -07:00 |
|
|
2d21ff1137
|
Merge branch 'master' of code:cdsc_reddit into excise_reindex
|
2021-08-03 15:02:08 -07:00 |
|
Nate E TeBlunthuis
|
cf86c7492c
|
update clustering scripts
|
2021-08-03 14:55:02 -07:00 |
|
Nate E TeBlunthuis
|
0b95bea30e
|
support isolates in visualization
|
2021-05-13 22:26:58 -07:00 |
|
Nate E TeBlunthuis
|
e1c9d9af6f
|
Remove 'exclude phrases' parameter.
|
2021-05-03 10:37:09 -07:00 |
|
Nate E TeBlunthuis
|
7df8436067
|
Use Latent semantic indexing and hdbscan
|
2021-05-02 23:39:55 -07:00 |
|
Nate E TeBlunthuis
|
36b24ee933
|
reindex tfidf in memory instead of using spark
|
2021-04-30 12:48:19 -07:00 |
|
Nate E TeBlunthuis
|
6a3bfa26ee
|
bugfix
|
2021-04-26 22:31:05 -07:00 |
|
Nate E TeBlunthuis
|
0fe120e4ab
|
support passing in list of tfidf vectors.
Also lowercases included subreddits.
|
2021-04-26 11:44:56 -07:00 |
|
Nate E TeBlunthuis
|
003a48aea5
|
bugfix in weekly similarities
|
2021-04-22 10:37:04 -07:00 |
|
Nate E TeBlunthuis
|
f0176d9f0d
|
Changes for cosine similarities on klone.
|
2021-04-05 23:21:06 -07:00 |
|
Nate E TeBlunthuis
|
06430903f0
|
add included_subreddits parameter to cosine similarities.
|
2021-02-22 18:38:34 -08:00 |
|
Nate E TeBlunthuis
|
4dc949de5f
|
Changes from hyak.
|
2021-02-22 16:03:48 -08:00 |
|
Nate E TeBlunthuis
|
3155600514
|
remove nsfw subs from topN
|
2020-12-28 21:11:44 -08:00 |
|
Nate E TeBlunthuis
|
4e20dce188
|
Updating to support wang-style user overlaps.
|
2020-12-24 22:38:04 -08:00 |
|
Nate E TeBlunthuis
|
56269deee3
|
Some improvements to run affinity clustering on larger dataset and
compute density.
|
2020-12-12 20:42:47 -08:00 |
|
Nate E TeBlunthuis
|
e6294b5b90
|
Refactor and reorganze.
|
2020-12-08 17:32:20 -08:00 |
|