Nathan TeBlunthuis
|
3fea1f9388
|
sort and partition the term frequencies using spark.
|
2024-12-01 13:42:13 -08:00 |
|
Nathan TeBlunthuis
|
2b023fea8d
|
bugfix
|
2024-12-01 09:58:09 -08:00 |
|
Nathan TeBlunthuis
|
88fca0f82b
|
allow posts schemas to be nullable.
|
2024-12-01 09:55:12 -08:00 |
|
Nathan TeBlunthuis
|
271cbea7d9
|
add a 'limit' parameter for testing.
|
2024-12-01 09:51:49 -08:00 |
|
Nathan TeBlunthuis
|
4218bf864b
|
debugging.
|
2024-12-01 09:39:50 -08:00 |
|
Nathan TeBlunthuis
|
22d6a6961c
|
allow authors to be null in submissions.
|
2024-11-27 20:04:05 -08:00 |
|
Nathan TeBlunthuis
|
a5ca25dd6e
|
bugfix.
|
2024-11-27 19:56:06 -08:00 |
|
Nathan TeBlunthuis
|
2e5181602b
|
bugfix.
|
2024-11-27 19:53:04 -08:00 |
|
Nathan TeBlunthuis
|
0d7f4d3cec
|
pass through stopWords.
|
2024-11-27 19:33:28 -08:00 |
|
Nathan TeBlunthuis
|
5d48c0eb55
|
pass through mwe_tokenize
|
2024-11-27 19:31:59 -08:00 |
|
Nathan TeBlunthuis
|
91cc1edf02
|
pass through mwe_pass
|
2024-11-27 19:20:49 -08:00 |
|
Nathan TeBlunthuis
|
2decdc9750
|
move function to outer scope.
|
2024-11-27 19:13:49 -08:00 |
|
Nathan TeBlunthuis
|
7da046735b
|
move function to outer scope.
|
2024-11-27 19:10:34 -08:00 |
|
Nathan TeBlunthuis
|
0631256956
|
make the output directory.
|
2024-11-27 19:06:24 -08:00 |
|
Nathan TeBlunthuis
|
8cb9683bc2
|
bugfix
|
2024-11-27 19:03:52 -08:00 |
|
Nathan TeBlunthuis
|
587e1c0022
|
bugfix.
|
2024-11-27 18:56:22 -08:00 |
|
Nathan TeBlunthuis
|
78eb16f4d6
|
more path munging.
|
2024-11-27 18:53:16 -08:00 |
|
Nathan TeBlunthuis
|
a0a6a08bf2
|
handle case where we're in a parent directory.
|
2024-11-27 18:49:03 -08:00 |
|
Nathan TeBlunthuis
|
a84b633641
|
add absolute path to call.
|
2024-11-27 18:42:29 -08:00 |
|
Nathan TeBlunthuis
|
ce7b5f92eb
|
bugfix.
|
2024-11-27 17:20:04 -08:00 |
|
Nathan TeBlunthuis
|
fbf905c740
|
rename file
|
2024-11-27 11:55:31 -08:00 |
|
Nathan TeBlunthuis
|
dd894ebf61
|
support posts in ngrams
|
2024-11-27 11:51:22 -08:00 |
|
|
9345f9de94
|
make pass keyword arg to dataframe.drop
|
2023-05-31 09:47:21 -07:00 |
|
|
07b0dff9bc
|
changes for archiving.
|
2023-05-23 17:18:19 -07:00 |
|
|
811a0d87c4
|
changes from dirty branch.
|
2023-05-18 10:29:08 -07:00 |
|
|
c190791364
|
add 2 more umap parameters
|
2022-06-08 17:27:37 -07:00 |
|
|
5a40465a62
|
add support for umap->hdbscan clustering method
|
2022-06-08 17:01:27 -07:00 |
|
|
55b75ea6fc
|
Merge remote-tracking branch 'refs/remotes/origin/excise_reindex' into excise_reindex
|
2022-04-06 11:14:13 -07:00 |
|
|
197518a222
|
git-annex in
|
2022-04-06 11:11:11 -07:00 |
|
|
53f5b8c03c
|
add note to try other tf normalization strategies.
|
2022-03-31 12:17:16 -07:00 |
|
|
65deba5e4e
|
Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex
|
2022-01-19 14:01:44 -08:00 |
|
|
7b130a30af
|
commit changes from smap project.
|
2022-01-19 13:57:02 -08:00 |
|
|
98c1317af5
|
update pushshift dumps.
|
2021-12-10 21:23:32 -08:00 |
|
|
541e125b28
|
lsi support for weekly similarities
|
2021-08-11 22:48:33 -07:00 |
|
|
b7c39a3494
|
Merge branch 'master' of code:cdsc_reddit into excise_reindex
|
2021-08-03 15:13:39 -07:00 |
|
|
ce549c6c97
|
Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex
|
2021-08-03 15:13:21 -07:00 |
|
|
6e43294a41
|
Updates to similarities code for smap project.
|
2021-08-03 15:06:48 -07:00 |
|
|
14ab979f59
|
Merge branch 'master' of code:cdsc_reddit
|
2021-08-03 15:03:40 -07:00 |
|
|
2d21ff1137
|
Merge branch 'master' of code:cdsc_reddit into excise_reindex
|
2021-08-03 15:02:08 -07:00 |
|
Nate E TeBlunthuis
|
cf86c7492c
|
update clustering scripts
|
2021-08-03 14:55:02 -07:00 |
|
Nate E TeBlunthuis
|
c6122bb429
|
Merge branch 'master' of code:cdsc_reddit
|
2021-07-28 15:32:21 -07:00 |
|
Nate E TeBlunthuis
|
596e1ff339
|
no longer do we need to get daily dumps
|
2021-07-28 15:32:04 -07:00 |
|
Nate E TeBlunthuis
|
87ffaa6858
|
script for picking the best clustering given constraints
|
2021-05-14 19:10:36 -07:00 |
|
Nate E TeBlunthuis
|
7b14db67de
|
Merge branch 'excise_reindex' of code:cdsc_reddit into excise_reindex
|
2021-05-13 22:28:31 -07:00 |
|
Nate E TeBlunthuis
|
0b95bea30e
|
support isolates in visualization
|
2021-05-13 22:26:58 -07:00 |
|
Nate E TeBlunthuis
|
582cf263ea
|
bug fix in affinity clustering
|
2021-05-13 22:26:15 -07:00 |
|
Nate E TeBlunthuis
|
8a2248fae1
|
Merge remote-tracking branch 'origin/excise_reindex' into temp
|
2021-05-10 18:32:03 -07:00 |
|
Nate E TeBlunthuis
|
47ba04aa97
|
add script for pulling cluster timeseries
|
2021-05-10 18:24:22 -07:00 |
|
Nate E TeBlunthuis
|
4cb7eeec80
|
Refactor to make a decent api.
|
2021-05-10 13:46:49 -07:00 |
|
Nate E TeBlunthuis
|
f05cb962e0
|
refactor clustring in object oriented style
|
2021-05-07 22:33:26 -07:00 |
|