Nathan TeBlunthuis
|
a1f94078c4
|
improve style.
|
2025-08-07 09:20:49 -07:00 |
|
Nathan TeBlunthuis
|
329d682f4c
|
fix asyncio bug.
|
2025-08-07 09:10:16 -07:00 |
|
Nathan TeBlunthuis
|
19f67b3679
|
try fixing coro issue.
|
2025-08-07 08:58:45 -07:00 |
|
Nathan TeBlunthuis
|
9b3237014d
|
fix a couple possible bugs.
|
2025-08-05 23:20:04 -07:00 |
|
Nathan TeBlunthuis
|
bd8c30d80f
|
fix raising exception.
|
2025-08-04 07:57:31 -07:00 |
|
Nathan TeBlunthuis
|
adf02310ef
|
bugfix
|
2025-08-03 21:54:41 -07:00 |
|
Nathan TeBlunthuis
|
a563eaf6fc
|
Timeout diffs.
|
2025-08-03 20:04:51 -07:00 |
|
Nathan TeBlunthuis
|
730c678f51
|
disable cache limits.
|
2025-08-03 11:50:57 -07:00 |
|
Nathan TeBlunthuis
|
77f367d95e
|
Revert changes related to row-buffering to just "increase cache size."
This reverts commit 1f08c01cf1 .
|
2025-08-03 09:37:35 -07:00 |
|
Nathan TeBlunthuis
|
1f08c01cf1
|
increase cache size.
|
2025-08-03 09:24:35 -07:00 |
|
Nathan TeBlunthuis
|
2f853a879d
|
reduce memory a tich more.
|
2025-08-01 20:10:38 -07:00 |
|
Nathan TeBlunthuis
|
9799919470
|
reduce memory even more.
|
2025-08-01 19:59:36 -07:00 |
|
Nathan TeBlunthuis
|
7528dc8b8e
|
try reducing memory more.
|
2025-08-01 19:52:18 -07:00 |
|
Nathan TeBlunthuis
|
615d630ff0
|
reduce memory usage.
|
2025-08-01 19:45:21 -07:00 |
|
Nathan TeBlunthuis
|
32bc05ddfd
|
set cache limits.
|
2025-08-01 19:30:46 -07:00 |
|
Nathan TeBlunthuis
|
ef78310580
|
reduce cache more.
|
2025-08-01 19:25:50 -07:00 |
|
Nathan TeBlunthuis
|
40a92d2db6
|
reduce wd2 cache size
|
2025-08-01 19:18:26 -07:00 |
|
Nathan TeBlunthuis
|
6bec0de9b2
|
configure wikidiff2.
|
2025-08-01 18:53:07 -07:00 |
|
Nathan TeBlunthuis
|
54e996b910
|
configure pywikidiff2 cache limits.
|
2025-08-01 09:24:54 -07:00 |
|
Nathan TeBlunthuis
|
83c92d1a37
|
decrease moved paragraph detection cutoff to see if that fixes memory issue.
|
2025-07-22 13:29:01 -07:00 |
|
Nathan TeBlunthuis
|
076df15740
|
force garbage collection.
|
2025-07-22 13:13:18 -07:00 |
|
Nathan TeBlunthuis
|
6557e25af7
|
make a new pywikidiff2 object for each revision to reduce memory.
|
2025-07-22 09:50:30 -07:00 |
|
Nathan TeBlunthuis
|
d20075b323
|
add memray for debugging memory usage.
|
2025-07-17 15:17:23 -07:00 |
|
Nathan TeBlunthuis
|
6d03cac28d
|
decrease batch_size.
|
2025-07-15 19:37:26 -07:00 |
|
Nathan TeBlunthuis
|
3a44cfd4da
|
increase batch size.
|
2025-07-15 19:09:36 -07:00 |
|
Nathan TeBlunthuis
|
0fbe788e31
|
use ichunked instead of chunked.
|
2025-07-15 18:25:44 -07:00 |
|
Nathan TeBlunthuis
|
37d095199a
|
inc version.
|
2025-07-15 15:37:55 -07:00 |
|
Nathan TeBlunthuis
|
6b04791de2
|
reduce batch size.
|
2025-07-15 15:31:00 -07:00 |
|
Nathan TeBlunthuis
|
507335941d
|
Revert "Merge branch 'compute-diffs' into HEAD"
This reverts commit 907a35323e , reversing
changes made to c40506137b .
|
2025-07-15 15:23:50 -07:00 |
|
Nathan TeBlunthuis
|
907a35323e
|
Merge branch 'compute-diffs' into HEAD
|
2025-07-15 15:23:13 -07:00 |
|
Nathan TeBlunthuis
|
c40506137b
|
make wikiq memory efficient again via batch processing.
|
2025-07-15 15:20:17 -07:00 |
|
Nathan TeBlunthuis
|
e53e7ada5d
|
try fixing the memory problem.
|
2025-07-14 18:58:27 -07:00 |
|
Nathan TeBlunthuis
|
76d54ae597
|
support partitioning output parquet by namespace.
|
2025-07-07 20:58:43 -07:00 |
|
Nathan TeBlunthuis
|
c9fb94ccc0
|
fix tests.
|
2025-07-07 20:25:00 -07:00 |
|
Nathan TeBlunthuis
|
ac1dd47b08
|
Merge branch 'compute-diffs' of gitea:collective/mediawiki_dump_tools into compute-diffs
|
2025-07-07 20:16:38 -07:00 |
|
Nathan TeBlunthuis
|
c597a6b7f4
|
refactor into src-layout package.
|
2025-07-07 20:14:13 -07:00 |
|
Nathan TeBlunthuis
|
a2984bc656
|
refactor into src-layout package.
|
2025-07-07 20:13:17 -07:00 |
|
Nathan TeBlunthuis
|
56c90fe1cc
|
add missing files + add sorted_columns metadata.
|
2025-07-07 19:08:31 -07:00 |
|
Nathan TeBlunthuis
|
d6c4c0a416
|
add (optional) diff and text columns to output.
|
2025-07-07 14:39:52 -07:00 |
|
Nathan TeBlunthuis
|
a8e9e7f4fd
|
wikidiff2 integration: pwr complete.
test for pwr based on wikidiff2.
|
2025-07-07 12:18:22 -07:00 |
|
Nathan TeBlunthuis
|
58c595bf0b
|
add test files.
|
2025-07-07 11:29:10 -07:00 |
|
Nathan TeBlunthuis
|
cc96bb5f3f
|
remove server.
|
2025-07-07 11:21:28 -07:00 |
|
Nathan TeBlunthuis
|
14e819e565
|
compare pywikidiff2 to making requests to wikidiff2.
|
2025-07-07 10:51:11 -07:00 |
|
Nathan TeBlunthuis
|
4654911533
|
almost there. working out edge cases.
|
2025-07-03 21:32:44 -07:00 |
|
Nathan TeBlunthuis
|
cf1fb61a84
|
WIP: fixing bugs and adding newlines to output.
|
2025-07-02 13:31:32 -07:00 |
|
Nathan TeBlunthuis
|
c4acc711d2
|
finish support for paragraph move.
|
2025-07-01 11:19:00 -07:00 |
|
Nathan TeBlunthuis
|
20de5b93f9
|
Merge branch 'tmp' into compute-diffs
|
2025-06-30 20:52:23 -05:00 |
|
Nathan TeBlunthuis
|
37734ed092
|
add test.
|
2025-06-30 15:45:56 -07:00 |
|
Nathan TeBlunthuis
|
5a3e4102b5
|
got wikidiff2 persistence working except for paragraph moves.
|
2025-06-30 15:37:54 -07:00 |
|
Nathan TeBlunthuis
|
186cb82fb8
|
some work on wiki_diff_matcher.py
|
2025-06-27 07:13:41 -07:00 |
|