1
0

updating biberplus and olmo_batched results

This commit is contained in:
mgaughan 2025-09-25 09:20:40 -05:00
parent 265b930578
commit 9d1359af36
5 changed files with 544156 additions and 3 deletions

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -9,3 +9,4 @@ _CudaDeviceProperties(name='NVIDIA A100-SXM4-80GB', major=8, minor=0, total_memo
Loading checkpoint shards: 0%| | 0/12 [00:00<?, ?it/s] Loading checkpoint shards: 8%|▊ | 1/12 [00:00<00:04, 2.32it/s] Loading checkpoint shards: 17%|█▋ | 2/12 [00:01<00:06, 1.64it/s] Loading checkpoint shards: 25%|██▌ | 3/12 [00:01<00:05, 1.66it/s] Loading checkpoint shards: 33%|███▎ | 4/12 [00:02<00:04, 1.64it/s] Loading checkpoint shards: 42%|████▏ | 5/12 [00:02<00:04, 1.68it/s] Loading checkpoint shards: 50%|█████ | 6/12 [00:03<00:03, 1.60it/s] Loading checkpoint shards: 58%|█████▊ | 7/12 [00:04<00:03, 1.65it/s] Loading checkpoint shards: 67%|██████▋ | 8/12 [00:04<00:02, 1.59it/s] Loading checkpoint shards: 75%|███████▌ | 9/12 [00:05<00:01, 1.56it/s] Loading checkpoint shards: 83%|████████▎ | 10/12 [00:06<00:01, 1.64it/s] Loading checkpoint shards: 92%|█████████▏| 11/12 [00:06<00:00, 1.76it/s] Loading checkpoint shards: 100%|██████████| 12/12 [00:06<00:00, 1.82it/s]
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
unsupervised batched olmo categorization pau at Wed Sep 24 13:25:31 CDT 2025

View File

@ -1,3 +1,8 @@
starting the job at: Tue Sep 23 16:37:07 CDT 2025
starting the job at: Tue Sep 23 16:48:14 CDT 2025
setting up the environment
running the biberplus labeling script
26024
26024
biberplus labeling pau
job finished, cleaning up
job pau at: Tue Sep 23 16:56:55 CDT 2025

View File

@ -111,10 +111,10 @@ if __name__ == "__main__":
#assert that order has been preserved
for _ in range(1000):
random_index = random.randrange(len(final_discussion_df))
assert task_description_df.iloc[random_index]["id"] == final_discussion_df.iloc[random_index]["id"]
assert first_discussion_df.iloc[random_index]["id"] == final_discussion_df.iloc[random_index]["id"]
#assert first_discussion_df.loc[random_index, "comment_text"] == final_discussion_df.loc[random_index, "comment_text"]
#assert that there are the same number of rows in first_discussion_df and second_discussion_df
assert len(task_description_df) == len(final_discussion_df)
assert len(first_discussion_df) == len(final_discussion_df)
final_discussion_df = final_discussion_df.drop(columns=["message"])
# if passing the prior asserts, let's write to a csv
final_discussion_df.to_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/092325_biberplus_complete_labels.csv", index=False)