| 
							
							
								 Nathan TeBlunthuis | 3a480940e9 | fix error loggging. | 2025-08-07 09:38:42 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | a1f94078c4 | improve style. | 2025-08-07 09:20:49 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 329d682f4c | fix asyncio bug. | 2025-08-07 09:10:16 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 19f67b3679 | try fixing coro issue. | 2025-08-07 08:58:45 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 9b3237014d | fix a couple possible bugs. | 2025-08-05 23:20:04 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | bd8c30d80f | fix raising exception. | 2025-08-04 07:57:31 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | adf02310ef | bugfix | 2025-08-03 21:54:41 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | a563eaf6fc | Timeout diffs. | 2025-08-03 20:04:51 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 730c678f51 | disable cache limits. | 2025-08-03 11:50:57 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 77f367d95e | Revert changes related to row-buffering to just "increase cache size." This reverts commit 1f08c01cf1. | 2025-08-03 09:37:35 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 1f08c01cf1 | increase cache size. | 2025-08-03 09:24:35 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 2f853a879d | reduce memory a tich more. | 2025-08-01 20:10:38 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 9799919470 | reduce memory even more. | 2025-08-01 19:59:36 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 7528dc8b8e | try reducing memory more. | 2025-08-01 19:52:18 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 615d630ff0 | reduce memory usage. | 2025-08-01 19:45:21 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 32bc05ddfd | set cache limits. | 2025-08-01 19:30:46 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | ef78310580 | reduce cache more. | 2025-08-01 19:25:50 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 40a92d2db6 | reduce wd2 cache size | 2025-08-01 19:18:26 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 6bec0de9b2 | configure wikidiff2. | 2025-08-01 18:53:07 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 54e996b910 | configure pywikidiff2 cache limits. | 2025-08-01 09:24:54 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 83c92d1a37 | decrease moved paragraph detection cutoff to see if that fixes memory issue. | 2025-07-22 13:29:01 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 076df15740 | force garbage collection. | 2025-07-22 13:13:18 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 6557e25af7 | make a new pywikidiff2 object for each revision to reduce memory. | 2025-07-22 09:50:30 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | d20075b323 | add memray for debugging memory usage. | 2025-07-17 15:17:23 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 6d03cac28d | decrease batch_size. | 2025-07-15 19:37:26 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 3a44cfd4da | increase batch size. | 2025-07-15 19:09:36 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 0fbe788e31 | use ichunked instead of chunked. | 2025-07-15 18:25:44 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 37d095199a | inc version. | 2025-07-15 15:37:55 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 6b04791de2 | reduce batch size. | 2025-07-15 15:31:00 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 507335941d | Revert "Merge branch 'compute-diffs' into HEAD" This reverts commit 907a35323e, reversing
changes made toc40506137b. | 2025-07-15 15:23:50 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 907a35323e | Merge branch 'compute-diffs' into HEAD | 2025-07-15 15:23:13 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | c40506137b | make wikiq memory efficient again via batch processing. | 2025-07-15 15:20:17 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | e53e7ada5d | try fixing the memory problem. | 2025-07-14 18:58:27 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 76d54ae597 | support partitioning output parquet by namespace. | 2025-07-07 20:58:43 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | c9fb94ccc0 | fix tests. | 2025-07-07 20:25:00 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | ac1dd47b08 | Merge branch 'compute-diffs' of gitea:collective/mediawiki_dump_tools into compute-diffs | 2025-07-07 20:16:38 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | c597a6b7f4 | refactor into src-layout package. | 2025-07-07 20:14:13 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | a2984bc656 | refactor into src-layout package. | 2025-07-07 20:13:17 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 56c90fe1cc | add missing files + add sorted_columns metadata. | 2025-07-07 19:08:31 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | d6c4c0a416 | add (optional) diff and text columns to output. | 2025-07-07 14:39:52 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | a8e9e7f4fd | wikidiff2 integration: pwr complete. test for pwr based on wikidiff2. | 2025-07-07 12:18:22 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 58c595bf0b | add test files. | 2025-07-07 11:29:10 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | cc96bb5f3f | remove server. | 2025-07-07 11:21:28 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 14e819e565 | compare pywikidiff2 to making requests to wikidiff2. | 2025-07-07 10:51:11 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 4654911533 | almost there. working out edge cases. | 2025-07-03 21:32:44 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | cf1fb61a84 | WIP: fixing bugs and adding newlines to output. | 2025-07-02 13:31:32 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | c4acc711d2 | finish support for paragraph move. | 2025-07-01 11:19:00 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 20de5b93f9 | Merge branch 'tmp' into compute-diffs | 2025-06-30 20:52:23 -05:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 37734ed092 | add test. | 2025-06-30 15:45:56 -07:00 |  | 
			
				
					| 
							
							
								 Nathan TeBlunthuis | 5a3e4102b5 | got wikidiff2 persistence working except for paragraph moves. | 2025-06-30 15:37:54 -07:00 |  |