Commit Graph

65 Commits

Author SHA1 Message Date
Nathan TeBlunthuis
39fec0820d use my version of mwxml since it fixes a bug. 2025-05-28 21:13:18 -07:00
Nathan TeBlunthuis
383ee03250 Merge branch 'parquet_support' of gitea:collective/mediawiki_dump_tools into parquet_support 2025-05-28 21:09:13 -07:00
Nathan TeBlunthuis
15e9234903 adding pyproject.toml 2025-05-28 20:59:55 -07:00
Nathan TeBlunthuis
8c7d46472f Merge branch 'parquet_support' of code:mediawiki_dump_tools into parquet_support 2025-05-28 20:54:52 -07:00
Nathan TeBlunthuis
3c7fb088d6 fix schema bugs. 2025-05-28 20:54:42 -07:00
Will Beason
df0ad1de63 Finish test standardization
Test logic is executed within the WikiqTestCase, while WikiqTester
handles creating and managing the variables tests need.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-28 10:11:58 -05:00
Will Beason
f3e6cc9392 Begin refactor of tests to make new tests easier to write
Handle file naming logic centrally rather than requiring a dedicated
class per input file.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-28 09:11:36 -05:00
Will Beason
c8b14c3303 Refactor test temporary file logic and wikiq call pattern
Test file refreshing and path computation is now handled by a helper.

The wikiq command is now constructed and handled by a single method
rather than in several ad-hoc ways.

The last places relying on the working directory are now removed.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-27 16:24:07 -05:00
Will Beason
4d3900b541 Standardize calling for wikiq in tests
This way failures show the output of stderr/etc.

Also create path constant strings for use in tests to avoid repetition
and make changes easier.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-27 14:27:49 -05:00
Will Beason
ebc57864f2 Make tests runnable from anywhere
Tests no longer implicitly require that the caller be in
a specific working directory.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-27 13:40:57 -05:00
Will Beason
3d0bf89938 Move main logic to main()
This avoids:
1) the main function running when sourcing the file
2) Creating many globally-scoped variables in the main logic

Also begin refactor of test output file logic

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-27 11:10:42 -05:00
Will Beason
6d133575c7 Remove resource leaks from tests
Close subprocesses within tests to fix resource leak warning.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-26 15:08:47 -05:00
Will Beason
09a84e7d11 Reformat Wikiq_Unit_Test.py
Separate out reformatting from editing.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-26 15:07:39 -05:00
Will Beason
9c5bf577e6 Remove unused dependencies and fix spacing
The "mw" and "numpy" dependencies were unneeded.

Spaces and tabs were inconsistently used.
They are now used consistently, changes via auto-formatter.

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-26 14:15:01 -05:00
Will Beason
4804ecc4b3 Add additional test dependencies
These are now noted in requirements.txt

Also make dependency on 7zip and ffmpeg explicit in README

Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-26 12:29:49 -05:00
Will Beason
7a4c41159c Exclude JetBrains config folder in .gitignore
Signed-off-by: Will Beason <willbeason@gmail.com>
2025-05-26 10:48:17 -05:00
1aea601a30 [Bugfix] Call the correct matchmake function. 2021-11-16 16:53:21 -08:00
c437b357db rename matchmake functions 2021-11-11 19:09:41 -08:00
bb83d62b74 Add some descriptive comments. 2021-10-19 16:55:24 -07:00
c285402683 add todos to readme 2021-10-18 14:14:11 -07:00
b1bea09ad6 fix bugs and unit tests 2021-10-18 13:33:05 -07:00
9a0c157ebb bugfix 2021-10-18 10:15:03 -07:00
ae870fed0b parquet path is code-complete 2021-10-17 21:46:31 -07:00
26f6d8f984 remove dependency on pandas. 2021-10-17 20:24:33 -07:00
ae9a241747 use dataclasses and pyarrow for types. 2021-10-17 20:21:22 -07:00
d8d20f670b initial work on parquet support 2021-10-17 13:22:22 -07:00
cdfa77d66d remove commented code 2019-11-11 11:28:48 -08:00
02b3250a36 refactor regex matching in a tidier object oriented style 2019-11-09 13:07:46 -08:00
414cc5ff2d validate tests and add asserts and baselines for regex tests. 2019-11-09 12:19:55 -08:00
sohyeonhwang
4ccde84529 added regex scanner v2's dump unit test file regextest.xml.bz2 2019-11-07 14:06:15 -06:00
sohyeonhwang
f147e1d899 merging pull containing revert-radius with 2nd version of regex scanner w/ unit tests 2019-11-07 13:28:17 -06:00
c84844cfb5 add unit tests for configuring revert_radius 2019-10-07 15:02:30 -07:00
c4416d0f1b make revert radius configurable 2019-10-07 13:57:49 -07:00
7b856bec86 Merge branch 'master' into regex_scanner 2019-10-05 18:17:03 -07:00
324ccc8e26 update baseline outputs 2019-10-05 16:36:07 -07:00
17529cdd48 bugfix, remove old legacy persistence flag 2019-10-05 16:13:11 -07:00
sohyeonhwang
7bf4559ceb changes for regex scanner addition 2019-10-05 15:36:58 -05:00
fb052ffa33 edont compute persistence by default 2019-09-22 15:54:17 -07:00
e871023ff5 elaborate docstring for persistence 2019-09-22 15:11:59 -07:00
7d62ff9fb7 improve help for namespace-include 2018-09-03 11:30:12 -07:00
f7f5bf8fd4 sub assertEquals assertEqual 2018-09-03 11:21:49 -07:00
Nate E TeBlunthuis
f784c77f60 add namespace filter parameter 2018-09-03 11:13:48 -07:00
317bafb50d Merge branch 'advanced_persistence' of code.communitydata.cc:mediawiki_dump_tools into advanced_persistence 2018-08-23 19:00:49 -07:00
7cd0bf3b9e Add parameter for selecting specific namespaces. 2018-08-23 18:49:32 -07:00
d93769c21f Merge branch 'advanced_persistence' of code.communitydata.cc:mediawiki_dump_tools into advanced_persistence 2018-08-23 18:27:09 -07:00
Nate E TeBlunthuis
afd40c1a45 Merge branch 'advanced_persistence' of code.communitydata.cc:/mediawiki_dump_tools into advanced_persistence 2018-08-23 18:25:51 -07:00
Nate E TeBlunthuis
e4222c45dd add namespace filter parameter 2018-08-23 18:25:08 -07:00
Nate E TeBlunthuis
829ffcffae Merge branch 'advanced_persistence' of code.communitydata.cc:/mediawiki_dump_tools into advanced_persistence 2018-08-23 18:23:36 -07:00
Nate E TeBlunthuis
776b73519a add namespace filter parameter 2018-08-23 18:23:23 -07:00
Nate E TeBlunthuis
5b6aaad862 add namespace filter parameter 2018-08-23 18:02:56 -07:00