1
0
mw-lifecycle-analysis/p2/quest/archived_data/101325_subcomment_neurobiber-pca.log
2025-10-24 09:03:54 -07:00

353 lines
10 KiB
Plaintext

starting the job at: Tue Oct 14 15:54:24 CDT 2025
setting up the environment
running the neurobiber labeling script
1 [Change 86685 merged by jenkins-bot:\nFollow-u...
2 [*** Bug 54785 has been marked as a duplicate ...
3 [Change 86685 had a related patch set uploaded...
5 [**Wikifram** wrote:\n\nAllright, thanks to bo...
6 [(In reply to comment #4)\nQUOTE\n\nVE product...
...
25022 [Er... drag and drop from what?, Is there no n...
25023 [Could you attach a screenshot please?, Drag &...
25025 [Sorry for not reply-ing., I did a test and co...
25026 [SCREEN_NAME: Please answer.]
25027 [I cannot replicate this., What's the name of ...
Name: olmo_cleaned_sentences, Length: 21901, dtype: object
[[18. ]
[ 6.5]
[23. ]
...
[ 5.5]
[ 3. ]
[ 6. ]]
Number of PCs explaining 90% variance: 24
Variance of each PCA component: [273.55786883 135.16197459 82.94008657 63.12754897 60.39119505
38.84258991 32.35268417 26.32979149 21.57186105 18.691479
16.21404524 13.63887204 13.3960516 11.40372708 10.25820109
9.13513531 8.8549811 8.29863619 7.99933399 7.06165956
6.73377968 6.4742109 5.92152116 5.75533066]
PC1:
normalized_CAP: 0.670
normalized_NNP: 0.604
median_sentence_length: -0.283
normalized_DET: -0.142
normalized_PREP: -0.122
normalized_PIN: -0.122
normalized_ART: -0.089
normalized_VPRT: -0.082
normalized_RB: -0.077
normalized_PRP: -0.071
PC2:
median_sentence_length: 0.929
normalized_NNP: 0.319
normalized_RB: -0.074
normalized_VPRT: -0.070
normalized_DET: -0.066
normalized_AUXB: -0.055
normalized_PRP: -0.045
normalized_SBJP: -0.045
normalized_X: 0.038
normalized_CAP: 0.035
PC3:
normalized_NN: 0.750
normalized_NNP: -0.291
normalized_RB: -0.266
normalized_PRP: -0.232
normalized_SBJP: -0.232
normalized_CAP: 0.211
normalized_VPRT: -0.169
normalized_FPP1: -0.117
normalized_NUM: 0.106
normalized_INF: -0.097
PC4:
normalized_CAP: 0.577
normalized_PREP: 0.426
normalized_PIN: 0.426
normalized_NNP: -0.281
normalized_PRP: 0.187
normalized_SBJP: 0.187
median_sentence_length: 0.159
normalized_X: -0.148
normalized_RB: 0.141
normalized_INF: 0.128
PC5:
normalized_PIN: 0.507
normalized_PREP: 0.507
normalized_NNP: 0.435
normalized_CAP: -0.349
normalized_RB: -0.256
median_sentence_length: -0.147
normalized_CONJ: 0.125
normalized_SBJP: -0.120
normalized_PRP: -0.120
normalized_VPRT: -0.100
PC6:
normalized_DET: 0.618
normalized_ART: 0.383
normalized_X: -0.278
normalized_NN: 0.273
normalized_VPRT: 0.261
normalized_NNP: 0.246
normalized_AUXB: 0.215
normalized_NUM: -0.191
normalized_INF: -0.163
normalized_INDA: 0.156
PC7:
normalized_NN: 0.477
normalized_PRP: 0.459
normalized_SBJP: 0.459
normalized_NNP: 0.247
normalized_FPP1: 0.236
normalized_DET: -0.196
normalized_AUXB: -0.171
normalized_CAP: -0.163
normalized_PASS: -0.138
normalized_PIT: 0.126
PC8:
normalized_RB: 0.781
normalized_NN: 0.265
normalized_DET: -0.188
normalized_PRP: -0.187
normalized_SBJP: -0.186
normalized_JJ: -0.169
normalized_NNP: 0.154
normalized_X: -0.153
normalized_TIME: 0.139
normalized_ART: -0.136
PC9:
normalized_JJ: 0.672
normalized_INF: 0.353
normalized_VPRT: -0.324
normalized_PASS: -0.219
normalized_AUXB: -0.218
normalized_NUM: -0.214
normalized_ART: 0.214
normalized_CONJ: -0.147
normalized_RB: 0.132
normalized_PEAS: -0.117
PC10:
normalized_INF: 0.652
normalized_JJ: -0.543
normalized_VPRT: -0.298
normalized_DET: 0.248
normalized_TO: 0.131
normalized_ART: 0.128
normalized_PRIV: 0.108
normalized_NUM: 0.086
normalized_RB: -0.077
normalized_POMD: 0.072
PC11:
normalized_INF: 0.420
normalized_VPRT: 0.383
normalized_AUXB: 0.379
normalized_ART: -0.261
normalized_JJ: 0.251
normalized_RB: -0.249
normalized_VBD: -0.247
normalized_X: -0.223
normalized_DET: -0.212
normalized_PASS: 0.174
PC12:
sentence_count: 0.651
normalized_X: -0.619
normalized_VPRT: -0.180
normalized_PUBV: 0.169
normalized_RB: -0.115
normalized_CONJ: 0.114
normalized_INF: -0.104
normalized_CCONJ: 0.100
normalized_QUOT: 0.099
normalized_DET: -0.091
PC13:
sentence_count: 0.637
normalized_X: 0.496
normalized_VBD: -0.299
normalized_NUM: -0.287
normalized_PUBV: -0.223
normalized_JJ: -0.198
normalized_VPRT: 0.186
normalized_CONJ: -0.099
normalized_QUOT: 0.067
normalized_PASS: -0.061
PC14:
normalized_NUM: 0.714
normalized_VBD: -0.354
normalized_VPRT: 0.233
normalized_AUXB: -0.186
normalized_PASS: -0.171
normalized_ART: 0.153
normalized_UH: -0.150
normalized_RB: 0.141
normalized_INDA: 0.138
normalized_PUBV: -0.134
PC15:
normalized_QUOT: 0.422
normalized_VBD: -0.380
normalized_AUXB: -0.331
sentence_count: -0.322
normalized_CONT: 0.315
normalized_UH: 0.255
normalized_NUM: -0.221
normalized_PASS: -0.221
normalized_X: -0.206
normalized_VPRT: 0.154
PC16:
normalized_PUBV: 0.481
normalized_CONJ: -0.394
normalized_UH: -0.360
normalized_VBD: 0.317
normalized_QUOT: 0.267
normalized_VPRT: 0.248
normalized_CONT: 0.201
normalized_NUM: 0.151
normalized_PASS: -0.137
normalized_TO: 0.128
PC17:
normalized_QUOT: 0.520
normalized_CONT: 0.417
normalized_PUBV: -0.301
normalized_PGAS: -0.290
normalized_UH: -0.260
normalized_CONJ: 0.234
normalized_VBD: 0.200
normalized_NOMZ: -0.194
normalized_PASS: 0.193
normalized_AUXB: 0.175
PC18:
normalized_CONJ: 0.631
normalized_PUBV: 0.523
normalized_NUM: -0.253
normalized_PGAS: -0.211
normalized_VPRT: 0.168
normalized_X: 0.160
normalized_ART: 0.155
normalized_DEMP: -0.126
normalized_UH: -0.118
normalized_TIME: -0.106
PC19:
normalized_UH: 0.659
normalized_PGAS: -0.517
normalized_VBD: 0.237
normalized_CCONJ: -0.196
normalized_CONJ: -0.175
normalized_NOMZ: -0.153
normalized_VPRT: 0.149
normalized_ART: 0.109
normalized_INDA: 0.101
normalized_RB: 0.099
PC20:
normalized_ART: 0.461
normalized_DET: -0.342
normalized_DEMO: -0.294
normalized_INDA: 0.293
normalized_DEMP: -0.288
normalized_AUXB: 0.230
normalized_PIT: 0.222
normalized_FPP1: -0.215
normalized_PGAS: 0.208
normalized_CCONJ: 0.185
PC21:
normalized_PGAS: 0.594
normalized_CCONJ: -0.353
normalized_UH: 0.330
normalized_CONJ: 0.272
normalized_AUXB: 0.250
normalized_PRIV: 0.241
normalized_BEMA: 0.153
normalized_TIME: -0.141
normalized_PROD: -0.130
normalized_NUM: 0.125
PC22:
normalized_PRIV: 0.445
normalized_QUES: -0.422
normalized_CCONJ: 0.395
normalized_VPRT: 0.242
normalized_FPP1: 0.221
normalized_AUXB: -0.207
normalized_VBD: 0.200
normalized_BEMA: -0.178
normalized_PIT: -0.151
normalized_SPP2: -0.148
PC23:
normalized_NOMZ: 0.504
normalized_PRIV: 0.457
normalized_CCONJ: -0.327
normalized_PUBV: -0.283
normalized_NUM: -0.184
normalized_VBD: 0.180
normalized_SCONJ: 0.170
normalized_UH: -0.168
normalized_DEMP: -0.161
normalized_PGAS: -0.161
PC24:
normalized_CCONJ: 0.506
normalized_QUES: 0.414
normalized_CONJ: 0.251
normalized_PASS: -0.238
normalized_BEMA: 0.207
normalized_WH: 0.207
normalized_VBD: 0.186
normalized_DEMO: -0.180
normalized_PEAS: -0.164
normalized_SCONJ: 0.161
Top 10 PC1 values:
PC1 PC2 ... AuthorPHID date_created
23531 123.243897 22.112164 ... PHID-USER-arjqb24x4oae7awzpfp6 1424754141
707 123.226678 22.102265 ... PHID-USER-pun3sjvg3cemjzbgyo2t 1363132183
744 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1353551242
749 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1353384355
2243 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1356175107
5921 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1353366778
5933 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1353123761
5935 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1353386649
10080 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1366298361
10418 123.226678 22.102265 ... PHID-USER-fovtl67ew4l4cc3oeypc 1355363288
[10 rows x 34 columns]
Bottom 10 PC1 values:
PC1 PC2 ... AuthorPHID date_created
24812 -131.318535 438.637876 ... PHID-USER-fo56wm4wxiwpoofn2xdu 1463441072
24813 -131.130989 438.728132 ... PHID-USER-fo56wm4wxiwpoofn2xdu 1463441050
13983 -88.027511 274.892016 ... PHID-USER-v7vgzvvcw7v2umf737ri 1380947348
16510 -82.500013 294.909402 ... PHID-USER-izojihzr4ja3jsgzn5wv 1354470131
161 -68.446710 197.206426 ... PHID-USER-hyfm4swq76s4j642w46x 1374730027
24815 -60.440128 175.352637 ... PHID-USER-fo56wm4wxiwpoofn2xdu 1463439992
6163 -59.523505 195.679514 ... PHID-USER-4bjsher5mqcoikeqnnec 1379611711
22005 -59.492044 211.972278 ... PHID-USER-maceogqtxg4qfaefx7wd 1440633395
24010 -53.793798 153.114760 ... PHID-USER-lhtlnmkdbzlz6pbxaqdd 1428469742
24009 -53.614161 153.284397 ... PHID-USER-lhtlnmkdbzlz6pbxaqdd 1428538077
[10 rows x 34 columns]
Top 10 PC2 values:
PC1 PC2 ... AuthorPHID date_created
24813 -131.130989 438.728132 ... PHID-USER-fo56wm4wxiwpoofn2xdu 1463441050
24812 -131.318535 438.637876 ... PHID-USER-fo56wm4wxiwpoofn2xdu 1463441072
16510 -82.500013 294.909402 ... PHID-USER-izojihzr4ja3jsgzn5wv 1354470131
13983 -88.027511 274.892016 ... PHID-USER-v7vgzvvcw7v2umf737ri 1380947348
22005 -59.492044 211.972278 ... PHID-USER-maceogqtxg4qfaefx7wd 1440633395
161 -68.446710 197.206426 ... PHID-USER-hyfm4swq76s4j642w46x 1374730027
6163 -59.523505 195.679514 ... PHID-USER-4bjsher5mqcoikeqnnec 1379611711
20858 -52.549327 192.146265 ... PHID-USER-22bsa5u75jz3ci3wnplu 1441031208
24815 -60.440128 175.352637 ... PHID-USER-fo56wm4wxiwpoofn2xdu 1463439992
18294 -43.267655 159.973982 ... PHID-USER-vk6mlmacfhx77egryy5i 1394419981
[10 rows x 34 columns]
Bottom 10 PC2 values:
PC1 PC2 ... AuthorPHID date_created
17259 -12.413915 -20.310670 ... PHID-USER-6vzzsmi22zem6yttr6vp 1321220595
22246 2.436022 -19.030642 ... PHID-USER-2nnm76h4ykalvvref2ye 1461480989
24780 -8.420485 -18.295879 ... PHID-USER-lsveyqlsb4acoowxr5yj 1420344576
7427 12.144652 -18.033451 ... PHID-USER-wz5bw3q6zykhqbbeohzq 1375791780
7055 -1.553566 -17.924389 ... PHID-USER-cfsvvgbtlqnbt2yokfjf 1377020909
23122 9.656987 -17.642747 ... PHID-USER-2nnm76h4ykalvvref2ye 1467721812
16776 6.551795 -17.537527 ... PHID-USER-6vzzsmi22zem6yttr6vp 1317838205
7471 -0.812161 -17.516875 ... PHID-USER-wkpnidxoctuhawexig5p 1386166246
13670 3.270330 -17.516754 ... PHID-USER-5dwuaigmkz2vzg65lape 1401902866
20682 3.694061 -17.391146 ... PHID-USER-uciss2jl2e4ifxqqk7wk 1440083315
[10 rows x 34 columns]
job finished, cleaning up
job pau at: Tue Oct 14 15:54:56 CDT 2025