The big challenges here (and remaining) are as follows:
1. Deltas requires changes to be given at the token level,
whereas wikidiff2 reports changes at the byte level. Thus,
it is often required to tokenize sequences of text to convert
to the desired token indices. As-is this is done inefficiently,
often requiring re-tokenization of previously-tokenized sequences.
A better implementation would incrementally tokenize, or
automatically find the referenced sequences.
2. Deltas only allows for Equal/Insert/Delete operations,
while wikidiff2 also detects paragraph moves. These paragraph
moves are NOT equivalent to Equal, as the moved paragraphs
are not guaranteed to be equivalent, just very similar.
Wikidiff2 does not report changes to moved paragraphs, so
to preserve token persistence, a difference algorithm
would need to be performed on the before/after sequences.
A stopgap (currently implemented) is to turn these
into strict deletions/insertions.
3. There appears to be a lot of memory consumption, and
sometimes this results in memory overflow. I am unsure
if this is a memory leak or simply that re-tokenizing
causes significant enough memory throughput that
my machine can't handle it.
4. Deltas expects all tokens in the before/after text to
be covered by segment ranges of Equal/Insert/Delete, but
wikidiff2 does not appear to ever emit any Equal ranges,
instead skipping them. These ranges must be computed
and inserted in sequence. As-is the code does not correctly
handle unchanged text at the end of pages.
Signed-off-by: Will Beason <willbeason@gmail.com>
This is inefficient as it requires an individal request per diff.
Going to try collecting the revision texts to reduce communication
overhead.
Signed-off-by: Will Beason <willbeason@gmail.com>
This should help PR readability.
There is likely still some unused code, but that should be the
bulk of it.
Signed-off-by: Will Beason <willbeason@gmail.com>
This will allow making columns optional, as desired, and make
adding new columns straightforward without impacting existing
behavior.
Signed-off-by: Will Beason <willbeason@gmail.com>