Nathan TeBlunthuis
|
6bec0de9b2
|
configure wikidiff2.
|
2025-08-01 18:53:07 -07:00 |
|
Nathan TeBlunthuis
|
54e996b910
|
configure pywikidiff2 cache limits.
|
2025-08-01 09:24:54 -07:00 |
|
Nathan TeBlunthuis
|
83c92d1a37
|
decrease moved paragraph detection cutoff to see if that fixes memory issue.
|
2025-07-22 13:29:01 -07:00 |
|
Nathan TeBlunthuis
|
076df15740
|
force garbage collection.
|
2025-07-22 13:13:18 -07:00 |
|
Nathan TeBlunthuis
|
6557e25af7
|
make a new pywikidiff2 object for each revision to reduce memory.
|
2025-07-22 09:50:30 -07:00 |
|
Nathan TeBlunthuis
|
d20075b323
|
add memray for debugging memory usage.
|
2025-07-17 15:17:23 -07:00 |
|
Nathan TeBlunthuis
|
6d03cac28d
|
decrease batch_size.
|
2025-07-15 19:37:26 -07:00 |
|
Nathan TeBlunthuis
|
3a44cfd4da
|
increase batch size.
|
2025-07-15 19:09:36 -07:00 |
|
Nathan TeBlunthuis
|
0fbe788e31
|
use ichunked instead of chunked.
|
2025-07-15 18:25:44 -07:00 |
|
Nathan TeBlunthuis
|
6b04791de2
|
reduce batch size.
|
2025-07-15 15:31:00 -07:00 |
|
Nathan TeBlunthuis
|
507335941d
|
Revert "Merge branch 'compute-diffs' into HEAD"
This reverts commit 907a35323e , reversing
changes made to c40506137b .
|
2025-07-15 15:23:50 -07:00 |
|
Nathan TeBlunthuis
|
907a35323e
|
Merge branch 'compute-diffs' into HEAD
|
2025-07-15 15:23:13 -07:00 |
|
Nathan TeBlunthuis
|
c40506137b
|
make wikiq memory efficient again via batch processing.
|
2025-07-15 15:20:17 -07:00 |
|
Nathan TeBlunthuis
|
e53e7ada5d
|
try fixing the memory problem.
|
2025-07-14 18:58:27 -07:00 |
|
Nathan TeBlunthuis
|
76d54ae597
|
support partitioning output parquet by namespace.
|
2025-07-07 20:58:43 -07:00 |
|
Nathan TeBlunthuis
|
c597a6b7f4
|
refactor into src-layout package.
|
2025-07-07 20:14:13 -07:00 |
|