Update on Overleaf.
This commit is contained in:
19
article.Rtex
19
article.Rtex
@@ -637,7 +637,7 @@ As demonstrated in our simulations, knowing whether an AC makes systematic miscl
|
||||
Fortunately, manually annotated data can be used to detect systematic misclassification.
|
||||
For example, \citet{fong_machine_2021} suggest using Sargan's J-test of the null hypothesis that the product of the AC's predictions and regression residuals have an expected value of 0.
|
||||
More generally, one can test if the data's conditional independence structures can be represented by Figures \ref{fig:simulation.1a} or \ref{fig:simulation.2a}. This can be done, for example, via likelihood ratio tests of $P(W|X,Z) = P(W|X,Y,Z)$ (if an AC measures an independent variable $X$) or of $P(W|Y) = P(W|Y,Z,X)$ (if an AC measures a dependent variable $Y$) or by visual inspection of plots of relating misclassifications to other variables \citep{carroll_measurement_2006}.
|
||||
We strongly recommend using such methods to test for differential error and to design an appropriate correction.
|
||||
We strongly recommend using such methods to test for systematic misclassification and to design an appropriate correction.
|
||||
|
||||
% For example, ``algorithmic audits'' \citep[e.g.,][]{rauchfleisch_false_2020, kleinberg_algorithmic_2018} evaluate the performance of AC across different subgroups in the data.
|
||||
|
||||
@@ -672,10 +672,11 @@ Therefore, both corrected and uncorrected estimates should be presented as part
|
||||
\section{Conclusion and Limitations}
|
||||
|
||||
Misclassification bias is an important threat to validity in studies that use automatic classifiers to measure statistical variables.
|
||||
As we showed in an example with data from the Perspective API, widely used and very accurate automated classifiers can cause type-1 and type-2 errors.
|
||||
As evidence by our literature review, this problem not attracted enough attention within communication science \citep[but see][]{bachl_correcting_2017} and even in the broader computational social science community.
|
||||
As we showed in an example with data from the Perspective API, widely used and very accurate automated classifiers can cause type-I and type-II errors.
|
||||
As evidence by our literature review, this problem has not attracted enough attention within communication science \citep[but see][]{bachl_correcting_2017} nor in the broader computational social science community.
|
||||
Although current best-practices of reporting metrics of classifier performance on manually annotated validation data, for instance metrics like precision or recall, are important, they provide little protection from misclassification bias.
|
||||
These practices use annotations to enact a transparency ritual to ward against misclassification bias, but annotations can do much more. With the right statistical model, they can correct misclassification bias.
|
||||
|
||||
We introduce maximum likelihood adjustment (MLA), a new method we designed to correct misclassification bias and use Monte Carlo simulations to
|
||||
evaluate it in comparison to other recently proposed error correction methods.
|
||||
Our MLA method is the only one that is effective across a wide range of scenarios. It is also straightforward to use. Our implementation in the R package \texttt{misclassificationmodels} provides a familiar formula interface for regression models.
|
||||
@@ -790,7 +791,7 @@ Only slightly more than half of all studies included information on the size of
|
||||
|
||||
\section{Other Error Correction Methods}
|
||||
\label{appendix:other.methods}
|
||||
Statisticans have introduce a range of other error correction methods which we did not test in our simulations. Here, we shortly discuss three additional methods and explain why we did not include them in our simulations.
|
||||
Statisticans have introduce a range of other error correction methods which we did not test in our simulations. Here, we briefly discuss three additional methods and explain why we did not include them in our simulations.
|
||||
|
||||
\emph{Simulation extrapolation} (SIMEX) simulates the process generating measurement error to model how measurement error affects an analysis and ultimately to approximate an analysis with no measurement error \citep{carroll_measurement_2006}. SIMEX is a very powerful and general method that can be used without manually annotated data, but may be more complicated than necessary to correct measurement error from ACs when manually annotated data is available. Likelihood methods are easy to apply to misclassification so SIMEX seems unnecessary \citep{carroll_measurement_2006}.
|
||||
|
||||
@@ -815,13 +816,13 @@ To explain why the MLA approach is effective, we follow \citet{carroll_measureme
|
||||
&= \sum_{x}{P(Y|X=x)P(W|Y,X=x)P(X=x)} \label{eq:mle.covariate.chainrule.4}
|
||||
\end{align}
|
||||
\noindent
|
||||
Equation \ref{eq:mle.covariate.chainrule.1} integrates $X$ out of the joint probability of $Y$ and $W$ by summing over its possible values $x$. If $X$ is binary, this means adding the probability given $x=1$ to the probability given $x=0$. When $X$ is observed, say $x=0$, then $P(X=0)=1$ and $P(X=1)=0$. As a result, only the true value of $X$ contributes to the likelihood. However, when $X$ is unobserved, all of its possible values contribute. In this way, integrating out $X$ allows us to include data where $X$ is not observed to the likelihood.
|
||||
Equation \ref{eq:mle.covariate.chainrule.1} integrates $X$ out of the joint probability of $Y$ and $W$ by summing over its possible values $x$. If $X$ is binary, this means adding the probability given $x=1$ to the probability given $x=0$. When $X$ is observed, say $x=0$, then $P(X=0)=1$ and $P(X=1)=0$. As a result, only the true value of $X$ contributes to the likelihood. However, when $X$ is unobserved, all of its possible values contribute. In this way, integrating out $X$ allows us to include data where $X$ is not observed in the likelihood.
|
||||
|
||||
Equation \ref{eq:mle.covariate.chainrule.2} uses the chain rule of probability to factor the joint probability $P(Y,W)$ of $Y$ and $W$ from $P(Y|W,X)$, the conditional probability of $Y$ given $W$ and $X$, and $P(W,X=x)$, the joint probability of $W$ and $X$. This lets us see how maximizing $\mathcal{L}(\Theta|Y,W)$, the joint likelihood of $\Theta$ given $Y$ and $W$ accounts for the uncertainty of automated classifications. For each possible value $x$ of $X$, it weights the model of the outcome $Y$ by the probability that $x$ is the true value and that the AC outputs $W$.
|
||||
|
||||
Equation \ref{eq:mle.covariate.chainrule.3} shows a different way to factor the joint probability $P(Y,W)$ so that $W$ is not in the model of $Y$. Since $X$ and $W$ are correlated, if $W$ is in the model for $Y$, the estimation of $B_1$ will be biased. By including $Y$ in the model for $W$, Equation \ref{eq:mle.covariate.chainrule.3} can account for differential measurement error.
|
||||
|
||||
Equation \ref{eq:mle.covariate.chainrule.4} factors $P(Y,X=x)$ the joint probability of $Y$ and $X$ into $P(Y|X=x)$, the conditional probability of $Y$ given $X$, $P(W|X=x,Y)$, the conditional probability of $W$ given $X$ and $Y$, and $P(X=x)$ the probability of $X$. This shows that fitting a model $Y$ given $X$ in this framework, such as the regression model $Y = B_0 + B_1 X + B_2 Z$ requires including $X$. Without validation data, $P(X=x)$ is difficult to calculate without strong assumptions \citep{carroll_measurement_2006}, but $P(X=x)$ can easily be estimated using a sample of validation data.
|
||||
Equation \ref{eq:mle.covariate.chainrule.4} factors $P(Y,X=x)$ the joint probability of $Y$ and $X$ into $P(Y|X=x)$, the conditional probability of $Y$ given $X$, $P(W|X=x,Y)$, the conditional probability of $W$ given $X$ and $Y$, and $P(X=x)$ the probability of $X$. This shows that fitting a model $Y$ given $X$ in this framework, such as the regression model $Y = B_0 + B_1 X + B_2 Z$ requires including the exposure model for $P(X=x)$. Without validation data, $P(X=x)$ is difficult to calculate without strong assumptions \citep{carroll_measurement_2006}, but $P(X=x)$ can easily be estimated using a sample of validation data.
|
||||
|
||||
%Our appendix includes supplementary simulations that explore how robust our method to model mispecification.
|
||||
Equations \ref{eq:mle.covariate.chainrule.1}--\ref{eq:mle.covariate.chainrule.4} demonstrate the generality of this method because the conditional probabilities may be calculated using a wide range of probability models.
|
||||
@@ -831,7 +832,7 @@ Equations \ref{eq:mle.covariate.chainrule.1}--\ref{eq:mle.covariate.chainrule.4}
|
||||
\subsection{When an AC Measures the Dependent Variable}
|
||||
|
||||
Again, we will maximize $\mathcal{L}(\Theta|Y,W)$, the joint likelihood of the parameters $\Theta$ given the outcome $Y$ and automated classifications $W$ measuring the dependent variable $Y$ \citep{carroll_measurement_2006}.
|
||||
We use the law of total probability to integrate out $Y$ and the chain rule of probability to factor the joint probability into $P(Y)$, the probability of $Y$, and $P(W|Y)$, the conditional probability of $W$ given $Y$.
|
||||
We again use the law of total probability to integrate out $Y$ and the chain rule of probability to factor the joint probability into $P(Y)$, the probability of $Y$, and $P(W|Y)$, the conditional probability of $W$ given $Y$.
|
||||
|
||||
\begin{align}
|
||||
P(Y,W) &= \sum_{y}{P(Y=y,W)} \\
|
||||
@@ -925,7 +926,7 @@ grid.draw(p)
|
||||
|
||||
\section{Robustness Tests}\label{appendix:robustness}
|
||||
|
||||
Appendix \ref{appendix:robustness} discusses robustness tests for our simulations. In the following sections, we show what happens when the error model is misspecified (see section \ref{appendix:misspec}), when the accuracy of the classifier varies (see section \ref{appendix:accuracy}), when the classified variable is not balanced but skewed (see section \ref{appendix:imbalanced}), and when the degree of systematic misclassification changes (see section \ref{appendix:degreebias}).
|
||||
Appendix \ref{appendix:robustness} discusses robustness tests for our simulations. In the following sections, we show what happens when the error model is misspecified (section \ref{appendix:misspec}), when the accuracy of the classifier varies (section \ref{appendix:accuracy}), when the classified variable is not balanced but skewed (section \ref{appendix:imbalanced}), and when the degree of systematic misclassification changes (section \ref{appendix:degreebias}).
|
||||
|
||||
%\clearpage
|
||||
\subsection{Robustness Test I: Misspecification of the Error Correction Model}
|
||||
@@ -1003,7 +1004,7 @@ source('resources/robustness_check_plots.R')
|
||||
@
|
||||
|
||||
According to our literature review, the accuracy of reported classifiers strongly varies. But how does the performance of the classifier affect error correction methods and remaining bias in inferential modeling? To test this, we repeat \emph{Simulation 1a} (see Section \ref{appendix:iv.predacc}) and \emph{Simulation 2a} (see Section \ref{appendix:dv.predacc}) to show how varying accuracy of the AC affects estimates of independent variables $B_X$ and $B_Z$. Here, we let classifier accuracy range
|
||||
from \Sexpr{format.percent(min(robust_2_min_acc))} to \Sexpr{format.percent(max(robust_2_max_acc))}. We present results for a scenario withn 5,000 classifications and 100 manual annotations.
|
||||
from \Sexpr{format.percent(min(robust_2_min_acc))} to \Sexpr{format.percent(max(robust_2_max_acc))}. We present results for a scenario withn 5,000 classifications and 200 manual annotations.
|
||||
|
||||
\subsubsection{Varying Accuracy of an AC Predicting an Independent Variable}
|
||||
\label{appendix:iv.predacc}
|
||||
|
||||
@@ -36,6 +36,7 @@
|
||||
mylabel/.style={
|
||||
text width=2.2in,
|
||||
align=center,
|
||||
execute at begin node =\setlength{\baselineskip}{1.3ex},
|
||||
inner sep=1ex,
|
||||
font={\mdseries\itshape\sffamily #1}
|
||||
}
|
||||
@@ -75,7 +76,7 @@
|
||||
|
||||
\draw[myarrow] (independent.north)+(3ex,0) to [xshift=17ex, yshift=-4.5ex, controls=+(80:1.2) and +(170:1.2)] (outcome_nonsystematic_iv.west) {node [mylabel] {Nonsystematic misclassification}};
|
||||
|
||||
\draw[myarrow] (independent.south)+(3ex,0) to [xshift=15ex, yshift=-4ex, controls=+(290:0.4) and +(180:0.8)] (outcome_systematic_iv.west) {node [mylabel] { Systematic misclassification}};
|
||||
\draw[myarrow] (independent.south)+(3ex,0) to [xshift=15ex, yshift=-5.8ex, controls=+(290:0.4) and +(180:0.8)] (outcome_systematic_iv.west) {node [mylabel] { Systematic misclassification}};
|
||||
|
||||
\draw[myarrownotip] (correct.south) to [controls=+(280:3) and +(150:2)] (dependent);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user