Pushshift's files.pushshift.io archive is gone since Reddit cut off
third-party API access in 2023, so the dumps/ pull and SHA-check scripts
no longer work. The old/ directory of pre-refactor scripts was likewise
superseded by current versions in similarities/.
README rewritten to credit Nate as original developer, name current
maintainers, document the directory layout, point at the CDSC wiki for
the ArcticShift/torrent-based workflow, fix several stale script paths,
and correct an incorrect tf-normalization formula (max, not sum).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>