updating biberplus and olmo_batched results

2025-09-25 09:20:40 -05:00 · 2025-09-25 09:20:40 -05:00 · 9d1359af36
commit 9d1359af36
parent 265b930578
5 changed files with 544156 additions and 3 deletions
--- a/p2/quest/092325_biberplus_complete_labels.csv
+++ b/p2/quest/092325_biberplus_complete_labels.csv
--- a/p2/quest/all_092225_olmo_batched_categorized.csv
+++ b/p2/quest/all_092225_olmo_batched_categorized.csv
--- a/p2/quest/batched-mw-olmo-info-cat.log
+++ b/p2/quest/batched-mw-olmo-info-cat.log
@ -9,3 +9,4 @@ _CudaDeviceProperties(name='NVIDIA A100-SXM4-80GB', major=8, minor=0, total_memo
 
Loading checkpoint shards:   0%|          | 0/12 [00:00<?, ?it/s]
Loading checkpoint shards:   8%|▊         | 1/12 [00:00<00:04,  2.32it/s]
Loading checkpoint shards:  17%|█▋        | 2/12 [00:01<00:06,  1.64it/s]
Loading checkpoint shards:  25%|██▌       | 3/12 [00:01<00:05,  1.66it/s]
Loading checkpoint shards:  33%|███▎      | 4/12 [00:02<00:04,  1.64it/s]
Loading checkpoint shards:  42%|████▏     | 5/12 [00:02<00:04,  1.68it/s]
Loading checkpoint shards:  50%|█████     | 6/12 [00:03<00:03,  1.60it/s]
Loading checkpoint shards:  58%|█████▊    | 7/12 [00:04<00:03,  1.65it/s]
Loading checkpoint shards:  67%|██████▋   | 8/12 [00:04<00:02,  1.59it/s]
Loading checkpoint shards:  75%|███████▌  | 9/12 [00:05<00:01,  1.56it/s]
Loading checkpoint shards:  83%|████████▎ | 10/12 [00:06<00:01,  1.64it/s]
Loading checkpoint shards:  92%|█████████▏| 11/12 [00:06<00:00,  1.76it/s]
Loading checkpoint shards: 100%|██████████| 12/12 [00:06<00:00,  1.82it/s]
 Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
 This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
+unsupervised batched olmo categorization pau at Wed Sep 24 13:25:31 CDT 2025
--- a/p2/quest/cleaned_biberplus-categorization.log
+++ b/p2/quest/cleaned_biberplus-categorization.log
@ -1,3 +1,8 @@
-starting the job at: Tue Sep 23 16:37:07 CDT 2025
+starting the job at: Tue Sep 23 16:48:14 CDT 2025
 setting up the environment
 running the biberplus labeling script
+26024
+26024
+biberplus labeling pau
+job finished, cleaning up
+job pau at: Tue Sep 23 16:56:55 CDT 2025
--- a/p2/quest/python_scripts/biberplus_labeling.py
+++ b/p2/quest/python_scripts/biberplus_labeling.py
@ -111,10 +111,10 @@ if __name__ == "__main__":
    #assert that order has been preserved 
    for _ in range(1000):
        random_index = random.randrange(len(final_discussion_df))
-        assert task_description_df.iloc[random_index]["id"] == final_discussion_df.iloc[random_index]["id"]
+        assert first_discussion_df.iloc[random_index]["id"] == final_discussion_df.iloc[random_index]["id"]
        #assert first_discussion_df.loc[random_index, "comment_text"] == final_discussion_df.loc[random_index, "comment_text"]
    #assert that there are the same number of rows in first_discussion_df and second_discussion_df
-    assert len(task_description_df) == len(final_discussion_df)
+    assert len(first_discussion_df) == len(final_discussion_df)
    final_discussion_df = final_discussion_df.drop(columns=["message"])
    # if passing the prior asserts, let's write to a csv
    final_discussion_df.to_csv("/home/nws8519/git/mw-lifecycle-analysis/p2/quest/092325_biberplus_complete_labels.csv", index=False)