1
0

comment out irrelevant stuff that made knitr fail.

This commit is contained in:
2023-03-02 13:22:24 -08:00
parent e36c6a742f
commit 5e0c2de35b
2 changed files with 15 additions and 12 deletions

View File

@@ -1254,20 +1254,20 @@ Instead of tayloring an AC for a research study, using predictive features direc
\item These issues become even more important, and also more complex in important research designs such as those involving multiple languages.
\subsection{Imperfect human-coded validation data}
% \subsection{Imperfect human-coded validation data}
All approaches stated above depend on the human-coded validation data $X^*$. Most often, ACs are also trained on human-coded material. The content analysis literature has long been documented how unreliable human coding and manual content analysis papers routinely report intercoder reliability as a result \citep{krippendorff_content_2018}. Intercoder reliability metrics typically assume that human coders are interchangeable and the only source of disagreement is ``coder idiosyncrasies'' \citep{krippendorff_reliability_2004}. A previous monte-carlo simulation operationalizes these ``coder idiosyncrasies'' as a fixed probability that a coder makes a random guess independent of the coder and of the material \citep{geis_statistical_2021}. In this work, we accept this ``interchangeable coders making random errors'' (ICMRE) assumption. Under this optimistic assumption, only ``coder idiosyncrasies'' cause misclassification error in the validation data.
% All approaches stated above depend on the human-coded validation data $X^*$. Most often, ACs are also trained on human-coded material. The content analysis literature has long been documented how unreliable human coding and manual content analysis papers routinely report intercoder reliability as a result \citep{krippendorff_content_2018}. Intercoder reliability metrics typically assume that human coders are interchangeable and the only source of disagreement is ``coder idiosyncrasies'' \citep{krippendorff_reliability_2004}. A previous monte-carlo simulation operationalizes these ``coder idiosyncrasies'' as a fixed probability that a coder makes a random guess independent of the coder and of the material \citep{geis_statistical_2021}. In this work, we accept this ``interchangeable coders making random errors'' (ICMRE) assumption. Under this optimistic assumption, only ``coder idiosyncrasies'' cause misclassification error in the validation data.
\citet{song_validations_2020}'s monte-carlo simulation demonstrates that human-coded $X^*$ with a lower intercoder reliability generates more biased classification accuracy of the AC. So even if manual annotation errors are only due to the ICMRE assumption, they may bias results. None of the above correction approaches account for the imperfect human coding of $X^*$, although \citet{zhang_how_2021} identifies the omission of this as a weakness of his proposed approach. Even in the context of manual content analysis, these ``coder idiosyncrasies'' are not routinely adjusted (although methods are available, e.g. \citet{bachl_correcting_2017}).
An advantage of our proposed method over prior approaches is that it automatically accounts for imperfection of human coding under the ICMRE assumption because the random errors in validation data are independent from the AC errors.
% \citet{song_validations_2020}'s monte-carlo simulation demonstrates that human-coded $X^*$ with a lower intercoder reliability generates more biased classification accuracy of the AC. So even if manual annotation errors are only due to the ICMRE assumption, they may bias results. None of the above correction approaches account for the imperfect human coding of $X^*$, although \citet{zhang_how_2021} identifies the omission of this as a weakness of his proposed approach. Even in the context of manual content analysis, these ``coder idiosyncrasies'' are not routinely adjusted (although methods are available, e.g. \citet{bachl_correcting_2017}).
% An advantage of our proposed method over prior approaches is that it automatically accounts for imperfection of human coding under the ICMRE assumption because the random errors in validation data are independent from the AC errors.
Precision of estimates can be improved using more than one independent coder. With two coders, for example, two sets of validation data are generated, $X^*_{1}$, $X^*_{2}$. We then list-wise delete all data that $X^*_{1} \neq X^*_{2}$. If the ICMRE assumption holds, the deleted data, where two coders disagree, can only be due to ``coder idiosyncrasies''. As coders are assumed to be interchangeable, the probability of two interchangeable coders both making the same misclassification error is much less than the probability that one makes a misclassification error . Using such ``labeled-only, coherent-only'' (LOCO) data improves the precision of consistent estimates in our simulation.
% Precision of estimates can be improved using more than one independent coder. With two coders, for example, two sets of validation data are generated, $X^*_{1}$, $X^*_{2}$. We then list-wise delete all data that $X^*_{1} \neq X^*_{2}$. If the ICMRE assumption holds, the deleted data, where two coders disagree, can only be due to ``coder idiosyncrasies''. As coders are assumed to be interchangeable, the probability of two interchangeable coders both making the same misclassification error is much less than the probability that one makes a misclassification error . Using such ``labeled-only, coherent-only'' (LOCO) data improves the precision of consistent estimates in our simulation.
\subsection{Measurement error in validation data}
% \subsection{Measurement error in validation data}
The simulations above assume that validation data is perfectly accurate. This is obviously unrealistic because, validation data, such as that obtained from human classifiers, normally has inaccuracies.
To evaluate the robustness of correction methods to imperfect validation data, we extend our scenarios with with nondifferential error with simulated validation data that is misclassified \Sexpr{format.percent(med.loco.accuracy)} of the time at random.
% The simulations above assume that validation data is perfectly accurate. This is obviously unrealistic because, validation data, such as that obtained from human classifiers, normally has inaccuracies.
% To evaluate the robustness of correction methods to imperfect validation data, we extend our scenarios with with nondifferential error with simulated validation data that is misclassified \Sexpr{format.percent(med.loco.accuracy)} of the time at random.
\subsubsection{Recommendation II: Employ at Least Two Manual Coders, not One}