18
0

datasets/add_months.sh: stop before copy step to force manual verification

The script now exits after Part 2 so the copy and cleanup commands must
be run manually. This prevents the live datasets from being touched
without a deliberate verification step in between.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 18:22:03 -07:00
parent 6b18840604
commit 526dc03732

View File

@@ -93,8 +93,9 @@ start_spark_and_run.sh 1 submissions_part2.py \
# --- Verify: inspect staging before copying to live ------------------------- # --- Verify: inspect staging before copying to live -------------------------
# #
# Stop here and check that the staging output looks right before running # The script stops here (exit 0 below). Check the staging output looks right
# the copy step. The live datasets are untouched at this point. Example: # before running the copy step manually. The live datasets are untouched at
# this point. Example checks:
# #
# ls -lah "$STAGING_COMMENTS_SUB" | head # ls -lah "$STAGING_COMMENTS_SUB" | head
# python3 -c " # python3 -c "
@@ -104,6 +105,8 @@ start_spark_and_run.sh 1 submissions_part2.py \
# print(t.column('created_utc')[0].as_py(), t.column('created_utc')[-1].as_py()) # print(t.column('created_utc')[0].as_py(), t.column('created_utc')[-1].as_py())
# " # "
exit 0
# --- Copy: add staging files into live datasets ----------------------------- # --- Copy: add staging files into live datasets -----------------------------
# #
# Run these lines manually after verifying staging. This is the only step # Run these lines manually after verifying staging. This is the only step