This should help PR readability.
There is likely still some unused code, but that should be the
bulk of it.
Signed-off-by: Will Beason <willbeason@gmail.com>
This will allow making columns optional, as desired, and make
adding new columns straightforward without impacting existing
behavior.
Signed-off-by: Will Beason <willbeason@gmail.com>
This is optional, and doesn't impact existing users as preexisting
behavior when users specify an output directory is unchanged.
This makes tests not need to copy large files as part of their
execution, as they can ask files to be written to explicit
locations.
Signed-off-by: Will Beason <willbeason@gmail.com>
Use simple loop for when we aren't collapsing users.
Add test which covers case when users are deleted.
Signed-off-by: Will Beason <willbeason@gmail.com>
This requires some data smoothing to get read_table and read_parquet
DataFrames to look close enough, but the test now passes and validates
that the data match.
Signed-off-by: Will Beason <willbeason@gmail.com>
Changed logic for handling anonymous edits so that wikiq handles
the type for editor ids consistently. Parquet can mix int64 and
None, but not int64 and strings - previously the code used the empty
string to denote anonymous editors.
Tests failing. Don't merge yet.
Signed-off-by: Will Beason <willbeason@gmail.com>