datasets/add_months.sh: stop before copy step to force manual verification
The script now exits after Part 2 so the copy and cleanup commands must be run manually. This prevents the live datasets from being touched without a deliberate verification step in between. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -93,8 +93,9 @@ start_spark_and_run.sh 1 submissions_part2.py \
|
|||||||
|
|
||||||
# --- Verify: inspect staging before copying to live -------------------------
|
# --- Verify: inspect staging before copying to live -------------------------
|
||||||
#
|
#
|
||||||
# Stop here and check that the staging output looks right before running
|
# The script stops here (exit 0 below). Check the staging output looks right
|
||||||
# the copy step. The live datasets are untouched at this point. Example:
|
# before running the copy step manually. The live datasets are untouched at
|
||||||
|
# this point. Example checks:
|
||||||
#
|
#
|
||||||
# ls -lah "$STAGING_COMMENTS_SUB" | head
|
# ls -lah "$STAGING_COMMENTS_SUB" | head
|
||||||
# python3 -c "
|
# python3 -c "
|
||||||
@@ -104,6 +105,8 @@ start_spark_and_run.sh 1 submissions_part2.py \
|
|||||||
# print(t.column('created_utc')[0].as_py(), t.column('created_utc')[-1].as_py())
|
# print(t.column('created_utc')[0].as_py(), t.column('created_utc')[-1].as_py())
|
||||||
# "
|
# "
|
||||||
|
|
||||||
|
exit 0
|
||||||
|
|
||||||
# --- Copy: add staging files into live datasets -----------------------------
|
# --- Copy: add staging files into live datasets -----------------------------
|
||||||
#
|
#
|
||||||
# Run these lines manually after verifying staging. This is the only step
|
# Run these lines manually after verifying staging. This is the only step
|
||||||
|
|||||||
Reference in New Issue
Block a user