Nathan TeBlunthuis
|
c59d251d19
|
write clusters and read with spark instead of creating data frame.
|
2024-12-31 14:37:50 -08:00 |
|
Nathan TeBlunthuis
|
a8a86c2440
|
add timeseries code
|
2024-12-31 16:27:04 -06:00 |
|
Nathan TeBlunthuis
|
79d1826ba4
|
enforce min_df constraint in counting lsi features.
|
2024-12-30 16:17:31 -08:00 |
|
Nathan TeBlunthuis
|
3555542862
|
use min/max df constraints in counting nterms.
|
2024-12-30 16:10:50 -08:00 |
|
Nathan TeBlunthuis
|
a9b296dd73
|
bugfix
|
2024-12-28 20:18:53 -08:00 |
|
Nathan TeBlunthuis
|
d9db21686d
|
remove unnecessary isoformat
|
2024-12-28 20:08:12 -08:00 |
|
Nathan TeBlunthuis
|
41fea31fce
|
bugfix
|
2024-12-28 20:04:38 -08:00 |
|
Nathan TeBlunthuis
|
7aa22c7385
|
bugfix
|
2024-12-28 20:02:24 -08:00 |
|
Nathan TeBlunthuis
|
f11d4cfc72
|
use static tfidf (not weekly) to create tfidf matrix
|
2024-12-28 20:00:53 -08:00 |
|
Nathan TeBlunthuis
|
7b5ac73b2c
|
use static tfidf (not weekly) to create tfidf matrix
|
2024-12-28 19:58:14 -08:00 |
|
Nathan TeBlunthuis
|
e2e7d7dbb1
|
more print debugging
|
2024-12-28 19:27:42 -08:00 |
|
Nathan TeBlunthuis
|
c317ef6475
|
debugging: print the shape
|
2024-12-28 19:21:24 -08:00 |
|
Nathan TeBlunthuis
|
c3cce0817e
|
bugfix
|
2024-12-28 14:31:24 -08:00 |
|
Nathan TeBlunthuis
|
c9464f86f7
|
interface fix.
|
2024-12-28 14:27:56 -08:00 |
|
Nathan TeBlunthuis
|
f3db4efbb1
|
pass nterms as int.
|
2024-12-28 14:24:24 -08:00 |
|
Nathan TeBlunthuis
|
27f29e63fa
|
typo fix.
|
2024-12-28 14:18:58 -08:00 |
|
Nathan TeBlunthuis
|
3f277ad99e
|
pass weeks as strings.
|
2024-12-28 14:10:55 -08:00 |
|
Nathan TeBlunthuis
|
02ec11f726
|
no longer need to convert from spark dates into isoformat.
|
2024-12-28 13:55:54 -08:00 |
|
Nathan TeBlunthuis
|
104b708ff6
|
use duckdb not spark to prepare for weekly similarities.
|
2024-12-28 13:45:17 -08:00 |
|
Nathan TeBlunthuis
|
74ee86e443
|
add weekly_cosine_similarities script.
|
2024-12-25 21:15:38 -08:00 |
|
Nathan TeBlunthuis
|
a8a92d30df
|
bugfix
|
2024-12-19 23:34:55 -08:00 |
|
Nathan TeBlunthuis
|
638ab78375
|
comment out config.
|
2024-12-19 23:32:16 -08:00 |
|
Nathan TeBlunthuis
|
8cb75c8354
|
typo fix.
|
2024-12-19 20:10:34 -08:00 |
|
Nathan TeBlunthuis
|
0bbdc6bd5e
|
typo fix.
|
2024-12-19 20:09:00 -08:00 |
|
Nathan TeBlunthuis
|
8b69801c8d
|
correct number of partitions.
|
2024-12-19 19:39:18 -08:00 |
|
Nathan TeBlunthuis
|
189330198c
|
repartition for parallelism.
|
2024-12-19 17:53:27 -08:00 |
|
Nathan TeBlunthuis
|
c6c9ec173b
|
add shebang
|
2024-12-15 18:47:07 -08:00 |
|
Nathan TeBlunthuis
|
52694e0498
|
typofix
|
2024-12-15 08:23:06 -08:00 |
|
Nathan TeBlunthuis
|
cb2f2c9717
|
make executable.
|
2024-12-15 08:18:42 -08:00 |
|
Nathan TeBlunthuis
|
9a852b9300
|
was renamed to 'term_frequencies' prior to merge.
|
2024-12-12 07:54:28 -08:00 |
|
Nathan TeBlunthuis
|
3d192ab82f
|
Merge remote-tracking branch 'origin/icwsm_dataverse'
|
2024-12-12 07:45:06 -08:00 |
|
Nathan TeBlunthuis
|
e2b6c1b481
|
configure to use the g2-cpu node.
|
2024-12-12 07:17:10 -08:00 |
|
Nathan TeBlunthuis
|
f38ec6c129
|
smaller outchunk size.
|
2024-12-07 13:23:44 -08:00 |
|
Nathan TeBlunthuis
|
25bfc57baf
|
change path
|
2024-12-06 08:18:20 -08:00 |
|
Nathan TeBlunthuis
|
c3d2834110
|
use pyarrow instead of spark to write data
|
2024-12-06 08:09:02 -08:00 |
|
Nathan TeBlunthuis
|
8224195432
|
bugfix.
|
2024-12-05 11:08:18 -08:00 |
|
Nathan TeBlunthuis
|
5d70d3eb6d
|
improve spark configuration.
|
2024-12-04 10:43:13 -08:00 |
|
Nathan TeBlunthuis
|
89d03dd956
|
consistent naming and bugfix.
|
2024-12-04 09:24:45 -08:00 |
|
Nathan TeBlunthuis
|
472849ebd9
|
correct output path.
|
2024-12-04 09:07:10 -08:00 |
|
Nathan TeBlunthuis
|
85945eae90
|
correct paths.
|
2024-12-04 09:06:02 -08:00 |
|
Nathan TeBlunthuis
|
1cca01fb69
|
use Path to make directories not os.
|
2024-12-04 07:47:47 -08:00 |
|
Nathan TeBlunthuis
|
39c0fa7a29
|
bugfix.
|
2024-12-03 19:18:38 -08:00 |
|
Nathan TeBlunthuis
|
0436450ea8
|
typo fix
|
2024-12-03 19:16:49 -08:00 |
|
Nathan TeBlunthuis
|
4be8bb6bf5
|
bugfix
|
2024-12-03 19:15:07 -08:00 |
|
Nathan TeBlunthuis
|
ec5859c311
|
pass ngram_output through.
|
2024-12-03 19:05:44 -08:00 |
|
Nathan TeBlunthuis
|
a179d608eb
|
bugfix.
|
2024-12-03 19:02:26 -08:00 |
|
Nathan TeBlunthuis
|
73dd2a96a6
|
it's selftext not body
|
2024-12-03 18:59:27 -08:00 |
|
Nathan TeBlunthuis
|
5045d6052e
|
use post title and body in terms
|
2024-12-03 18:53:41 -08:00 |
|
Nathan TeBlunthuis
|
51234f1070
|
add inpath param for tfidf_authors_weekly.
|
2024-12-03 10:16:23 -08:00 |
|
Nathan TeBlunthuis
|
0a6ad65baf
|
add shebang
|
2024-12-03 09:06:40 -08:00 |
|