running PCA on subcomment values, adding new plot for closed_relevance
This commit is contained in:
parent
e29d4bf59c
commit
b21ecb02c3
112799
p2/quest/092325_subcomment_PCA_df.csv
Normal file
112799
p2/quest/092325_subcomment_PCA_df.csv
Normal file
File diff suppressed because one or more lines are too long
@ -1,4 +1,4 @@
|
||||
starting the job at: Thu Sep 25 09:36:43 CDT 2025
|
||||
starting the job at: Thu Sep 25 10:05:44 CDT 2025
|
||||
setting up the environment
|
||||
running the neurobiber labeling script
|
||||
Variance of each PCA component: [44.08472997 25.31736287 20.0163717 11.80556907 8.85200058 8.36660391
|
||||
@ -203,62 +203,62 @@ PC18:
|
||||
BIN_PRIV: -0.116
|
||||
BIN_FPP1: 0.103
|
||||
Top 10 PC1 values:
|
||||
PC1 PC2 ... priority closed_relevance
|
||||
19873 40.267200 26.528755 ... Medium False
|
||||
24120 34.012764 7.436658 ... Low False
|
||||
24529 33.020514 7.624464 ... Needs Triage False
|
||||
25549 33.018302 7.622737 ... Medium False
|
||||
24528 33.016089 7.621010 ... Needs Triage True
|
||||
23238 31.348286 5.402173 ... Medium False
|
||||
18729 29.627919 4.690955 ... Needs Triage True
|
||||
23016 29.595518 8.870229 ... Medium False
|
||||
14849 28.191116 6.625144 ... Low False
|
||||
21214 28.191116 6.625144 ... Low True
|
||||
PC1 PC2 PC3 ... id week_index priority
|
||||
19873 40.267200 26.528755 -11.406833 ... 75953 -32 Medium
|
||||
24120 34.012764 7.436658 -10.042571 ... 101814 -4 Low
|
||||
24529 33.020514 7.624464 0.683570 ... 90329 -19 Needs Triage
|
||||
25549 33.018302 7.622737 0.683000 ... 90328 -19 Medium
|
||||
24528 33.016089 7.621010 0.682431 ... 90330 -19 Needs Triage
|
||||
23238 31.348286 5.402173 -6.263101 ... 107627 4 Medium
|
||||
18729 29.627919 4.690955 -5.130935 ... 60818 -80 Needs Triage
|
||||
23016 29.595518 8.870229 -12.256826 ... 110277 7 Medium
|
||||
14849 28.191116 6.625144 -7.271761 ... 56457 3 Low
|
||||
21214 28.191116 6.625144 -7.271761 ... 56457 -93 Low
|
||||
|
||||
[10 rows x 26 columns]
|
||||
[10 rows x 25 columns]
|
||||
|
||||
Bottom 10 PC1 values:
|
||||
PC1 PC2 ... priority closed_relevance
|
||||
24481 -16.862586 13.863453 ... Needs Triage True
|
||||
23053 -16.174624 12.133559 ... Medium False
|
||||
23838 -15.421295 13.308099 ... Low False
|
||||
25791 -15.127553 14.746424 ... Medium True
|
||||
7451 -14.574686 5.821303 ... Medium False
|
||||
24467 -13.905417 7.936462 ... Needs Triage True
|
||||
23436 -13.827143 7.507781 ... Medium False
|
||||
24293 -13.667374 0.891979 ... Unbreak Now! True
|
||||
11814 -13.418003 7.854756 ... Low False
|
||||
968 -13.358491 0.305388 ... Needs Triage True
|
||||
PC1 PC2 PC3 ... id week_index priority
|
||||
24481 -16.862586 13.863453 8.545495 ... 92256 -17 Needs Triage
|
||||
23053 -16.174624 12.133559 0.579284 ... 110020 7 Medium
|
||||
23838 -15.421295 13.308099 -0.838241 ... 109719 7 Low
|
||||
25791 -15.127553 14.746424 22.119623 ... 85189 -28 Medium
|
||||
7451 -14.574686 5.821303 4.386196 ... 53758 2 Medium
|
||||
24467 -13.905417 7.936462 2.001860 ... 92606 -16 Needs Triage
|
||||
23436 -13.827143 7.507781 -2.056608 ... 103919 -1 Medium
|
||||
24293 -13.667374 0.891979 -6.145701 ... 88897 -21 Unbreak Now!
|
||||
11814 -13.418003 7.854756 -1.595019 ... 52497 0 Low
|
||||
968 -13.358491 0.305388 -3.980203 ... 55409 8 Needs Triage
|
||||
|
||||
[10 rows x 26 columns]
|
||||
[10 rows x 25 columns]
|
||||
Top 10 PC2 values:
|
||||
PC1 PC2 ... priority closed_relevance
|
||||
25606 6.196829 29.809964 ... Medium True
|
||||
21956 27.542757 27.763075 ... Needs Triage True
|
||||
25078 -4.462216 27.186434 ... High False
|
||||
19873 40.267200 26.528755 ... Medium False
|
||||
25820 -3.022591 23.093162 ... Medium True
|
||||
25814 20.151634 22.681554 ... Medium True
|
||||
13345 6.035595 21.910339 ... Lowest NaN
|
||||
22013 6.861197 21.673434 ... Needs Triage True
|
||||
23022 0.808467 21.111863 ... Medium False
|
||||
21966 -7.056224 20.953599 ... Needs Triage True
|
||||
PC1 PC2 PC3 ... id week_index priority
|
||||
25606 6.196829 29.809964 23.877767 ... 88139 -22 Medium
|
||||
21956 27.542757 27.763075 7.924919 ... 105099 0 Needs Triage
|
||||
25078 -4.462216 27.186434 1.860348 ... 85326 -27 High
|
||||
19873 40.267200 26.528755 -11.406833 ... 75953 -32 Medium
|
||||
25820 -3.022591 23.093162 -0.361349 ... 78160 -30 Medium
|
||||
25814 20.151634 22.681554 3.346066 ... 78837 -29 Medium
|
||||
13345 6.035595 21.910339 -6.417684 ... 51999 -2 Lowest
|
||||
22013 6.861197 21.673434 -7.690901 ... 103771 -1 Needs Triage
|
||||
23022 0.808467 21.111863 12.735632 ... 110276 7 Medium
|
||||
21966 -7.056224 20.953599 3.715673 ... 104656 0 Needs Triage
|
||||
|
||||
[10 rows x 26 columns]
|
||||
[10 rows x 25 columns]
|
||||
|
||||
Bottom 10 PC2 values:
|
||||
PC1 PC2 ... priority closed_relevance
|
||||
3134 5.606805 -12.562127 ... High True
|
||||
654 -0.797645 -12.364185 ... Unbreak Now! True
|
||||
16289 -0.897011 -12.328128 ... Medium False
|
||||
1207 4.714780 -12.127148 ... Needs Triage True
|
||||
1885 15.889004 -12.071062 ... Needs Triage True
|
||||
18211 6.521166 -11.920065 ... Needs Triage True
|
||||
2934 0.069845 -11.739971 ... High False
|
||||
25122 -1.657588 -11.388235 ... Medium True
|
||||
13276 15.441209 -11.380360 ... Lowest False
|
||||
2109 -2.166594 -11.371418 ... Needs Triage True
|
||||
PC1 PC2 PC3 ... id week_index priority
|
||||
3134 5.606805 -12.562127 3.104184 ... 54102 3 High
|
||||
654 -0.797645 -12.364185 4.558365 ... 49434 -11 Unbreak Now!
|
||||
16289 -0.897011 -12.328128 10.193536 ... 43224 -45 Medium
|
||||
1207 4.714780 -12.127148 4.095656 ... 54103 3 Needs Triage
|
||||
1885 15.889004 -12.071062 -5.011946 ... 52106 -1 Needs Triage
|
||||
18211 6.521166 -11.920065 4.671982 ... 73532 -40 Needs Triage
|
||||
2934 0.069845 -11.739971 3.000846 ... 54499 4 High
|
||||
25122 -1.657588 -11.388235 -4.465696 ... 97316 -10 Medium
|
||||
13276 15.441209 -11.380360 -3.592736 ... 52804 0 Lowest
|
||||
2109 -2.166594 -11.371418 -0.979264 ... 49816 -9 Needs Triage
|
||||
|
||||
[10 rows x 26 columns]
|
||||
[10 rows x 25 columns]
|
||||
job finished, cleaning up
|
||||
job pau at: Thu Sep 25 09:37:24 CDT 2025
|
||||
job pau at: Thu Sep 25 10:06:30 CDT 2025
|
||||
|
||||
265
p2/quest/description_092525_neurobiber-pca.log
Normal file
265
p2/quest/description_092525_neurobiber-pca.log
Normal file
@ -0,0 +1,265 @@
|
||||
starting the job at: Thu Sep 25 09:52:47 CDT 2025
|
||||
setting up the environment
|
||||
running the neurobiber labeling script
|
||||
Variance of each PCA component: [259.38215213 83.11803664 67.16301107 61.78747188 38.94875996
|
||||
32.78688889 26.45592105 21.9280629 18.734197 16.29485568
|
||||
13.48304855 11.50594609 10.77855857 9.30674176 8.96113511
|
||||
8.35521401 8.17815209 7.13194427]
|
||||
PC1:
|
||||
BIN_CAP: 0.680
|
||||
BIN_NNP: 0.647
|
||||
BIN_DET: -0.151
|
||||
BIN_PREP: -0.128
|
||||
BIN_PIN: -0.128
|
||||
BIN_VPRT: -0.091
|
||||
BIN_ART: -0.090
|
||||
BIN_RB: -0.086
|
||||
BIN_PRP: -0.077
|
||||
BIN_SBJP: -0.077
|
||||
PC2:
|
||||
BIN_NN: 0.744
|
||||
BIN_NNP: -0.320
|
||||
BIN_RB: -0.256
|
||||
BIN_CAP: 0.242
|
||||
BIN_PRP: -0.224
|
||||
BIN_SBJP: -0.224
|
||||
BIN_VPRT: -0.163
|
||||
BIN_FPP1: -0.113
|
||||
BIN_NUM: 0.104
|
||||
BIN_INF: -0.092
|
||||
PC3:
|
||||
BIN_CAP: 0.661
|
||||
BIN_NNP: -0.491
|
||||
BIN_RB: 0.266
|
||||
BIN_PRP: 0.223
|
||||
BIN_SBJP: 0.223
|
||||
BIN_VPRT: 0.137
|
||||
BIN_X: -0.128
|
||||
BIN_PIN: 0.125
|
||||
BIN_PREP: 0.125
|
||||
BIN_FPP1: 0.124
|
||||
PC4:
|
||||
BIN_PIN: 0.649
|
||||
BIN_PREP: 0.649
|
||||
BIN_NNP: 0.256
|
||||
BIN_CONJ: 0.157
|
||||
BIN_RB: -0.156
|
||||
BIN_NN: -0.089
|
||||
BIN_TO: 0.081
|
||||
BIN_X: -0.078
|
||||
BIN_VPRT: -0.057
|
||||
BIN_INF: 0.052
|
||||
PC5:
|
||||
BIN_DET: 0.622
|
||||
BIN_ART: 0.381
|
||||
BIN_X: -0.273
|
||||
BIN_VPRT: 0.264
|
||||
BIN_NN: 0.262
|
||||
BIN_NNP: 0.243
|
||||
BIN_AUXB: 0.222
|
||||
BIN_NUM: -0.187
|
||||
BIN_INF: -0.164
|
||||
BIN_INDA: 0.158
|
||||
PC6:
|
||||
BIN_NN: 0.486
|
||||
BIN_PRP: 0.464
|
||||
BIN_SBJP: 0.464
|
||||
BIN_FPP1: 0.239
|
||||
BIN_NNP: 0.238
|
||||
BIN_DET: -0.179
|
||||
BIN_AUXB: -0.174
|
||||
BIN_PASS: -0.144
|
||||
BIN_CAP: -0.135
|
||||
BIN_PIT: 0.128
|
||||
PC7:
|
||||
BIN_RB: 0.787
|
||||
BIN_NN: 0.266
|
||||
BIN_PRP: -0.188
|
||||
BIN_SBJP: -0.188
|
||||
BIN_DET: -0.182
|
||||
BIN_NNP: 0.154
|
||||
BIN_JJ: -0.153
|
||||
BIN_X: -0.150
|
||||
BIN_TIME: 0.134
|
||||
BIN_ART: -0.129
|
||||
PC8:
|
||||
BIN_JJ: 0.667
|
||||
BIN_INF: 0.352
|
||||
BIN_VPRT: -0.326
|
||||
BIN_ART: 0.234
|
||||
BIN_PASS: -0.220
|
||||
BIN_AUXB: -0.219
|
||||
BIN_NUM: -0.210
|
||||
BIN_CONJ: -0.149
|
||||
BIN_RB: 0.126
|
||||
BIN_PEAS: -0.118
|
||||
PC9:
|
||||
BIN_INF: 0.633
|
||||
BIN_JJ: -0.559
|
||||
BIN_VPRT: -0.310
|
||||
BIN_DET: 0.251
|
||||
BIN_ART: 0.129
|
||||
BIN_TO: 0.127
|
||||
BIN_PRIV: 0.100
|
||||
BIN_NUM: 0.084
|
||||
BIN_RB: -0.075
|
||||
BIN_POMD: 0.075
|
||||
PC10:
|
||||
BIN_INF: 0.443
|
||||
BIN_AUXB: 0.372
|
||||
BIN_VPRT: 0.368
|
||||
BIN_ART: -0.256
|
||||
BIN_RB: -0.249
|
||||
BIN_JJ: 0.247
|
||||
BIN_VBD: -0.246
|
||||
BIN_X: -0.231
|
||||
BIN_DET: -0.211
|
||||
BIN_PASS: 0.171
|
||||
PC11:
|
||||
BIN_X: 0.793
|
||||
BIN_PUBV: -0.266
|
||||
BIN_VPRT: 0.258
|
||||
BIN_VBD: -0.245
|
||||
BIN_NUM: -0.211
|
||||
BIN_CONJ: -0.157
|
||||
BIN_JJ: -0.145
|
||||
BIN_UH: -0.105
|
||||
BIN_INF: 0.103
|
||||
BIN_NOMZ: -0.079
|
||||
PC12:
|
||||
BIN_NUM: 0.765
|
||||
BIN_VBD: -0.239
|
||||
BIN_UH: -0.217
|
||||
BIN_VPRT: 0.206
|
||||
BIN_QUOT: -0.181
|
||||
BIN_RB: 0.161
|
||||
BIN_INDA: 0.145
|
||||
BIN_PGAS: -0.145
|
||||
BIN_JJ: 0.136
|
||||
BIN_ART: 0.135
|
||||
PC13:
|
||||
BIN_VBD: 0.468
|
||||
BIN_QUOT: -0.433
|
||||
BIN_AUXB: 0.357
|
||||
BIN_CONT: -0.324
|
||||
BIN_PASS: 0.255
|
||||
BIN_X: 0.220
|
||||
BIN_VPRT: -0.214
|
||||
BIN_UH: -0.185
|
||||
BIN_TIME: 0.135
|
||||
BIN_PUBV: 0.123
|
||||
PC14:
|
||||
BIN_UH: 0.499
|
||||
BIN_QUOT: -0.460
|
||||
BIN_VBD: -0.386
|
||||
BIN_CONT: -0.361
|
||||
BIN_PUBV: -0.265
|
||||
BIN_CONJ: 0.243
|
||||
BIN_NUM: -0.161
|
||||
BIN_VPRT: -0.122
|
||||
BIN_STPR: -0.108
|
||||
BIN_DEMP: -0.087
|
||||
PC15:
|
||||
BIN_PUBV: 0.512
|
||||
BIN_CONJ: -0.370
|
||||
BIN_QUOT: -0.318
|
||||
BIN_PGAS: 0.300
|
||||
BIN_CONT: -0.268
|
||||
BIN_VPRT: 0.244
|
||||
BIN_PASS: -0.241
|
||||
BIN_NOMZ: 0.229
|
||||
BIN_AUXB: -0.177
|
||||
BIN_TO: 0.108
|
||||
PC16:
|
||||
BIN_CONJ: 0.633
|
||||
BIN_UH: -0.460
|
||||
BIN_PUBV: 0.371
|
||||
BIN_NUM: -0.258
|
||||
BIN_VBD: -0.198
|
||||
BIN_NOMZ: 0.158
|
||||
BIN_SCONJ: -0.125
|
||||
BIN_PREP: -0.111
|
||||
BIN_PIN: -0.111
|
||||
BIN_X: 0.106
|
||||
PC17:
|
||||
BIN_PGAS: 0.513
|
||||
BIN_UH: -0.500
|
||||
BIN_PUBV: -0.371
|
||||
BIN_VPRT: -0.222
|
||||
BIN_VBD: -0.204
|
||||
BIN_CONJ: -0.176
|
||||
BIN_CCONJ: 0.175
|
||||
BIN_ART: -0.173
|
||||
BIN_X: -0.134
|
||||
BIN_INDA: -0.125
|
||||
PC18:
|
||||
BIN_ART: 0.456
|
||||
BIN_DET: -0.342
|
||||
BIN_DEMO: -0.306
|
||||
BIN_DEMP: -0.285
|
||||
BIN_INDA: 0.273
|
||||
BIN_PIT: 0.221
|
||||
BIN_FPP1: -0.220
|
||||
BIN_CCONJ: 0.219
|
||||
BIN_CONJ: -0.214
|
||||
BIN_AUXB: 0.211
|
||||
Top 10 PC1 values:
|
||||
PC1 PC2 PC3 ... week_index priority closed_relevance
|
||||
24527 124.414338 -16.859224 4.745394 ... -19 NaN NaN
|
||||
707 124.395927 -16.870930 4.737835 ... -16 NaN NaN
|
||||
744 124.395927 -16.870930 4.737835 ... -32 NaN NaN
|
||||
749 124.395927 -16.870930 4.737835 ... -32 NaN NaN
|
||||
2243 124.395927 -16.870930 4.737835 ... -28 NaN NaN
|
||||
5921 124.395927 -16.870930 4.737835 ... -32 NaN NaN
|
||||
5933 124.395927 -16.870930 4.737835 ... -33 NaN NaN
|
||||
5935 124.395927 -16.870930 4.737835 ... -32 NaN NaN
|
||||
10080 124.395927 -16.870930 4.737835 ... -11 NaN NaN
|
||||
10418 124.395927 -16.870930 4.737835 ... -29 NaN NaN
|
||||
|
||||
[10 rows x 26 columns]
|
||||
|
||||
Bottom 10 PC1 values:
|
||||
PC1 PC2 PC3 ... week_index priority closed_relevance
|
||||
13752 -24.875039 3.698789 0.288475 ... 11 NaN NaN
|
||||
18942 -24.875039 3.698789 0.288475 ... -85 NaN NaN
|
||||
14276 -24.572975 0.683877 7.752763 ... 10 NaN NaN
|
||||
19869 -24.572975 0.683877 7.752763 ... -86 NaN NaN
|
||||
25556 -23.009401 -10.063628 2.942026 ... 6 NaN NaN
|
||||
23477 -22.592972 -0.970333 -2.614535 ... 7 NaN NaN
|
||||
13907 -22.489084 10.362266 -8.549736 ... -9 NaN NaN
|
||||
14824 -22.001266 -17.807228 5.081855 ... 441 NaN NaN
|
||||
21189 -22.001266 -17.807228 5.081855 ... 345 NaN NaN
|
||||
24439 -21.740588 -10.174665 5.702323 ... 110 NaN NaN
|
||||
|
||||
[10 rows x 26 columns]
|
||||
Top 10 PC2 values:
|
||||
PC1 PC2 PC3 ... week_index priority closed_relevance
|
||||
117 53.467507 89.625455 44.442476 ... 4 NaN NaN
|
||||
2447 53.136124 89.414757 44.306417 ... 138 NaN NaN
|
||||
2471 53.136124 89.414757 44.306417 ... 20 NaN NaN
|
||||
22224 53.117714 89.403051 44.298858 ... 10 NaN NaN
|
||||
2728 19.109757 77.304171 11.238337 ... 8 NaN NaN
|
||||
5024 19.109757 77.304171 11.238337 ... 8 NaN NaN
|
||||
5135 19.109757 77.304171 11.238337 ... 8 NaN NaN
|
||||
17701 -14.842968 65.240407 -21.799507 ... -83 NaN NaN
|
||||
17591 -14.861378 65.228702 -21.807066 ... -100 NaN NaN
|
||||
24735 -14.916609 65.193586 -21.829743 ... 43 NaN NaN
|
||||
|
||||
[10 rows x 26 columns]
|
||||
|
||||
Bottom 10 PC2 values:
|
||||
PC1 PC2 PC3 ... week_index priority closed_relevance
|
||||
14558 56.232734 -41.162334 -61.443677 ... 10 NaN NaN
|
||||
6321 56.251144 -41.150628 -61.436118 ... 302 NaN NaN
|
||||
6322 56.251144 -41.150628 -61.436118 ... 139 NaN NaN
|
||||
6770 56.251144 -41.150628 -61.436118 ... 120 NaN NaN
|
||||
6771 56.251144 -41.150628 -61.436118 ... 120 NaN NaN
|
||||
10442 56.251144 -41.150628 -61.436118 ... 383 NaN NaN
|
||||
10443 56.251144 -41.150628 -61.436118 ... 383 NaN NaN
|
||||
10528 56.251144 -41.150628 -61.436118 ... 93 NaN NaN
|
||||
10529 56.251144 -41.150628 -61.436118 ... 93 NaN NaN
|
||||
11837 56.251144 -41.150628 -61.436118 ... 133 NaN NaN
|
||||
|
||||
[10 rows x 26 columns]
|
||||
job finished, cleaning up
|
||||
job pau at: Thu Sep 25 09:53:27 CDT 2025
|
||||
BIN
p2/quest/description_closed_relevance_092525_biber_pca_final.png
Normal file
BIN
p2/quest/description_closed_relevance_092525_biber_pca_final.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.6 MiB |
@ -58,7 +58,7 @@ if __name__ == "__main__":
|
||||
biber_vecs_pca = pca.fit_transform(biber_vecs)
|
||||
with open('092525_description_pca.pkl', 'wb') as f:
|
||||
pickle.dump(pca, f)
|
||||
selected_axis = "AuthorWMFAffil"
|
||||
selected_axis = "closed_relevance"
|
||||
|
||||
component_variances = np.var(biber_vecs_pca, axis=0)
|
||||
print("Variance of each PCA component:", component_variances)
|
||||
@ -84,7 +84,7 @@ if __name__ == "__main__":
|
||||
pc_dict['closed_relevance'] = biber_vec_df['closed_relevance']
|
||||
|
||||
plot_df = pd.DataFrame(pc_dict)
|
||||
plot_df.to_csv("092325_description_PCA_df.csv", index=False)
|
||||
#plot_df.to_csv("092325_subcomment_PCA_df.csv", index=False)
|
||||
|
||||
print("Top 10 PC1 values:")
|
||||
print(plot_df.nlargest(10, "PC1"))
|
||||
|
||||
BIN
p2/quest/subcomment_AuthorWMFAffil_092525_biber_pca_final.png
Normal file
BIN
p2/quest/subcomment_AuthorWMFAffil_092525_biber_pca_final.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 2.4 MiB |
Loading…
Reference in New Issue
Block a user