1
0

now with updated categorizations

This commit is contained in:
mgaughan 2025-06-02 11:29:59 -05:00
parent f9e3075b2b
commit c047c0a5cc
4 changed files with 530 additions and 511 deletions

View File

@ -7,3 +7,21 @@ Copying blob sha256:9fe6e2e61518cba6844870c03b285737daec35e62baf25ae7744629ed3a7
Copying blob sha256:41f16248e682693ff20b3032c1d5e5541cc87c5af898ae2ff9b24d2940e59100
Copying blob sha256:95d7b781703928cf3c4eece39d800cccb76728c375fedf51ecd83833fb25e458
Copying blob sha256:8f6c9048534734f4c873935293b7296225846ceb31c1a158400a67ea170dde7f
Copying blob sha256:ab17245097e491b9368790714f9d90ed447bf0973bd677cfe6f2456d62b72a13
Copying blob sha256:dfecd7e9912b76ed460b8edd5a85f1943666e38a973ab5458177cf2c7c3110e3
Copying blob sha256:464a8f74544589bf7b57f9a4cadcb6681e5ed00758f6c35025e691df4e88e890
Copying blob sha256:61d26dce6d4129f40549457a063c82aca2c606d73ef156d5ac7e495e1d52530a
Copying blob sha256:227a9906e6cccfa3aee837559aeb3fdcbf4409286dd4dd0a37287cfd483c37f6
Copying blob sha256:c826e867602d3c7a5d3b8a552e49d51c58cccf42c31d016a660a50b7f451ef09
Copying blob sha256:d40507eacecbbd8647bcee51d03f8b8cc86044d73cb72448112d49a08b8feaac
Copying blob sha256:93c7cb8303f3b8ca1165c92b4b55a08973e8bd1a1360dd7bc3cb8bd18804d2a8
Copying blob sha256:7f2d4a3887cae1984105738d5887b3ed325095939dfe31e89d5b47212b7f6479
Copying blob sha256:167c57c419bc5ef23ffe823e05c1cac741246ef69352f36a2724e2f4c276f52b
Copying blob sha256:1b456af08bb7c15512a9be77ce1ed44ce87f2c52315c99cbb2a99dd786adb4cb
Copying blob sha256:054fcf1bbe967cf874bfa40161b3c559f8cf03ca1e05a532a69a8edca4d8d0e5
Copying blob sha256:e26ee59fb49e43a8046b8c3812c52cc62bb8e5772e3323ff84c76a5715668c36
Copying blob sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
Copying blob sha256:da54e4cf5022248356962228df4c357d330f5f87e2ebc0fcff2b766400721cef
Copying blob sha256:684176763e41ba50c8aa61c6e6eb6aec1ac35eea61710971f410dd1a5a2953a8
Copying blob sha256:78c24341e0f9d5ae00c21f5dd0a35adf62f5c1ba2618b5e0c7e45994eb69f6b5
Copying blob sha256:435f630eb19ee65f1b1e2db0d34b278037511d4344ca482b720c6bb1f70b8f58

View File

@ -12,7 +12,7 @@ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B").to(device)
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B")
#priming prompt
prompt_1 = "For the GIVEN DATA, Please categorize it based on the following numbered characteristics: \n\n 1: YES/NO (Characteristic 1. This is an English language empirical study. English language empirical studies analyze observational or experimental data. We are exlcuding literature reviews from this definition.) \n 2: YES/NO (Characteristic 2. This focuses on free and open source software (FOSS). The focus of this paper is on FOSS projects and ecosystems.) \n 3: YES/NO (Characteristic 3. This focuses on FOSS project evolution. FOSS project evolution is the study of longitudinal changes to the characteristics of free and open source projects.) \n 4: YES/NO (Characteristic 4. This focuses on FOSS project adaptation. FOSS project adaptation describes the intentional changes made to the characteristics of FOSS projects to better align with the project's broader environment.) \n\n Only respond with the appropriate number followed by 'YES' if the characteristic is present in the provided data or 'NO' if it is not (e.g. '1. NO; 2. YES;'). Do not provide any additional information."
prompt_1 = "For the GIVEN DATA, Please categorize it based on the following numbered characteristics: \n\n 1: YES/NO (Characteristic 1. This is an English language empirical study. English language empirical studies discuss data or observations.) \n 2: YES/NO (Characteristic 2. This focuses on free and open source software (FOSS). The focus of this paper is on free or open source software projects and ecosystems.) \n 3: YES/NO (Characteristic 3. This focuses on FOSS project evolution. FOSS project evolution describes changes to free and open source projects and ecosystems.) \n 4: YES/NO (Characteristic 4. This focuses on FOSS project adaptation. FOSS project adaptation describes the intentional changes made by projects to better align with the project's broader environment.) \n\n Only respond with the appropriate number followed by 'YES' if the characteristic is present in the provided data or 'NO' if it is not (e.g. '1. YES; 2. NO;'). Do not provide any additional information."
example_4 = "Example 4: TITLE - Analysis of Open Source Software Evolution Using Evolution Curve Method \n ABSTRACT - Design and evolution of modem information systems is influenced by many factors: technical, organizational, social, and psychological. This is especially true for open source software systems (OSSS), when many developers from different backgrounds interact, share their ideas and contribute towards the development and improvement of a software product. The evolution of all OSSS is a continuous process of source code development, adaptation, improvement and maintenance. Studying changes to the various characteristics of source code can help us understand the evolution of a software system. In this paper, the software evolution process is analyzed using a proposed Evolution curve (E-curve) method, which is based on information theoretic metrics of source code. The method allows identifying major evolution stages and transition points of an analyzed software system. The application of the E-curves is demonstrated for the eMule system. .\n CATEGORIES: 1. YES; 2. YES; 3.YES; 4. NO"
@ -62,4 +62,4 @@ with open("cites/053025_man_filtered_dedup.csv", mode='r', newline='') as file:
array_of_categorizations.append(cite_dict)
#CSV everything
df = pd.DataFrame(array_of_categorizations)
df.to_csv('053025_olmo_categorized_citations.csv', index=False)
df.to_csv('060225_olmo_categorized_citations.csv', index=False)

View File

@ -1,9 +1,9 @@
starting the job at: Fri May 30 21:46:19 CDT 2025
starting the job at: Mon Jun 2 09:24:01 CDT 2025
setting up the environment
running the p1 categorization script
cuda
NVIDIA A100-SXM4-80GB
_CudaDeviceProperties(name='NVIDIA A100-SXM4-80GB', major=8, minor=0, total_memory=81153MB, multi_processor_count=108, uuid=841be301-db75-9627-af0f-04d8965fd651, L2_cache_size=40MB)
Loading checkpoint shards: 0%| | 0/6 [00:00<?, ?it/s] Loading checkpoint shards: 17%|█▋ | 1/6 [00:00<00:04, 1.04it/s] Loading checkpoint shards: 33%|███▎ | 2/6 [00:02<00:04, 1.16s/it] Loading checkpoint shards: 50%|█████ | 3/6 [00:03<00:03, 1.32s/it] Loading checkpoint shards: 67%|██████▋ | 4/6 [00:05<00:02, 1.43s/it] Loading checkpoint shards: 83%|████████▎ | 5/6 [00:06<00:01, 1.45s/it] Loading checkpoint shards: 100%|██████████| 6/6 [00:07<00:00, 1.28s/it] Loading checkpoint shards: 100%|██████████| 6/6 [00:07<00:00, 1.30s/it]
NVIDIA A100-PCIE-40GB
_CudaDeviceProperties(name='NVIDIA A100-PCIE-40GB', major=8, minor=0, total_memory=40442MB, multi_processor_count=108, uuid=c91b110a-9eb1-15b6-ff0a-7aeb47b26ff0, L2_cache_size=40MB)
Loading checkpoint shards: 0%| | 0/6 [00:00<?, ?it/s] Loading checkpoint shards: 17%|█▋ | 1/6 [00:00<00:04, 1.24it/s] Loading checkpoint shards: 33%|███▎ | 2/6 [00:01<00:03, 1.08it/s] Loading checkpoint shards: 50%|█████ | 3/6 [00:02<00:02, 1.05it/s] 04it/s] 04it/s] 04it/s] Loading checkpoint shards: 33%|███▎ | 2/6 [00:02<00:04, 1.16s/it] 04it/s] Loading checkpoint shards: 50%|█████ | 3/6 [00:03<00:03, 1.32s/it] 04it/s] Loading checkpoint shards: 67%|██████▋ | 4/6 [00:05<00:02, 1.43s/it] ]
job finished, cleaning up
job pau at: Fri May 30 23:21:25 CDT 2025
job pau at: Mon Jun 2 11:08:43 CDT 2025