Commit Graph

22 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
9799919470 reduce memory even more. 2025-08-01 19:59:36 -07:00
Nathan TeBlunthuis
7528dc8b8e try reducing memory more. 2025-08-01 19:52:18 -07:00
Nathan TeBlunthuis
615d630ff0 reduce memory usage. 2025-08-01 19:45:21 -07:00
Nathan TeBlunthuis
32bc05ddfd set cache limits. 2025-08-01 19:30:46 -07:00
Nathan TeBlunthuis
ef78310580 reduce cache more. 2025-08-01 19:25:50 -07:00
Nathan TeBlunthuis
40a92d2db6 reduce wd2 cache size 2025-08-01 19:18:26 -07:00
Nathan TeBlunthuis
6bec0de9b2 configure wikidiff2. 2025-08-01 18:53:07 -07:00
Nathan TeBlunthuis
54e996b910 configure pywikidiff2 cache limits. 2025-08-01 09:24:54 -07:00
Nathan TeBlunthuis
83c92d1a37 decrease moved paragraph detection cutoff to see if that fixes memory issue. 2025-07-22 13:29:01 -07:00
Nathan TeBlunthuis
076df15740 force garbage collection. 2025-07-22 13:13:18 -07:00
Nathan TeBlunthuis
6557e25af7 make a new pywikidiff2 object for each revision to reduce memory. 2025-07-22 09:50:30 -07:00
Nathan TeBlunthuis
d20075b323 add memray for debugging memory usage. 2025-07-17 15:17:23 -07:00
Nathan TeBlunthuis
6d03cac28d decrease batch_size. 2025-07-15 19:37:26 -07:00
Nathan TeBlunthuis
3a44cfd4da increase batch size. 2025-07-15 19:09:36 -07:00
Nathan TeBlunthuis
0fbe788e31 use ichunked instead of chunked. 2025-07-15 18:25:44 -07:00
Nathan TeBlunthuis
6b04791de2 reduce batch size. 2025-07-15 15:31:00 -07:00
Nathan TeBlunthuis
507335941d Revert "Merge branch 'compute-diffs' into HEAD"
This reverts commit 907a35323e, reversing
changes made to c40506137b.
2025-07-15 15:23:50 -07:00
Nathan TeBlunthuis
907a35323e Merge branch 'compute-diffs' into HEAD 2025-07-15 15:23:13 -07:00
Nathan TeBlunthuis
c40506137b make wikiq memory efficient again via batch processing. 2025-07-15 15:20:17 -07:00
Nathan TeBlunthuis
e53e7ada5d try fixing the memory problem. 2025-07-14 18:58:27 -07:00
Nathan TeBlunthuis
76d54ae597 support partitioning output parquet by namespace. 2025-07-07 20:58:43 -07:00
Nathan TeBlunthuis
c597a6b7f4 refactor into src-layout package. 2025-07-07 20:14:13 -07:00