diff --git a/article.Rtex b/article.Rtex index 8009873..32d8c94 100644 --- a/article.Rtex +++ b/article.Rtex @@ -680,9 +680,9 @@ evaluate it in comparison to other recently proposed error correction methods. Our MLA method is the only one that is effective across a wide range of scenarios. It is also straightforward to use. Our implementation in the R package \texttt{misclassificationmodels} provides a familiar formula interface for regression models. Remarkably, our simulations show that our method can use even an automated classifier below common accuracy standards to obtain consistent estimates. Therefore, low accuracy is not necessarily a barrier to using an AC. -Based on these results, we provide four recommendations for the future of automated content analysis: Researchers should (1) attempt manual content analysis before building or validating ACs to see whether human-labeled data is sufficient, (2) use manually annotated data to test for systematic misclassification and choose appropiate error correction methods, (3) correct for misclassifications via error correction methods, and (4) be transparent about the methodological decisions involved in AC-based classifications and error correction. +Based on these results, we provide four recommendations for the future of automated content analysis: Researchers should (1) attempt manual content analysis before building or validating ACs to see whether human-labeled data is sufficient, (2) use manually annotated data to test for systematic misclassification and choose appropriate error correction methods, (3) correct for misclassification via error correction methods, and (4) be transparent about the methodological decisions involved in AC-based classifications and error correction. -Our study has several limitations. First, the simulations and methods we introduce focus on misclassification by automated tools. They provisionally assume that human annotators do not make errors, especially systematic ones. +Our study has several limitations. First, the simulations and methods we introduce focus on misclassification by automated tools. They provisionally assume that human annotators do not make errors, especially notn systematic ones. This assumption can be reasonable if intercoder reliability is very high but, as with ACs, this may not always be the case. %Alternatively, validation data can be treated as a gold standard if the goal is measuring \emph{how a person categorizes content}, as opposed to the more common approach of measuring presumably objective content categories. That said, the prevailing approaches in content analysis use human coders to measure a latent category who are prone to misclassification. Thus, it may be important to account for measurement error by human coders \citep{bachl_correcting_2017} and by automated classifiers simultaneously. In theory, it is possible to extend our MLA approach in order to do so \citep{carroll_measurement_2006}. @@ -876,49 +876,50 @@ summary(res) For more information about the package, please see here: \url{https://osf.io/pyqf8/?view_only=c80e7b76d94645bd9543f04c2a95a87e}. +\section{Additional plots for Simulations 1 and 2} +\label{appendix:main.sim.plots} + +\begin{figure}[htbp!] +<>= + +p <- plot.simulation.iv(plot.df.example.1,iv='z') + +grid.draw(p) +@ +\caption{Estimates of $B_Z$ in \emph{simulation 1a}, multivariate regression with $X$ measured using machine learning and model accuracy independent of $X$, $Y$, and $Z$. All methods obtain precise and accurate estimates given sufficient validation data.} +\end{figure} + +\begin{figure}[htbp!] +<>= +p <- plot.simulation.iv(plot.df.example.2, iv='z') +grid.draw(p) +@ +\caption{Estimates of $B_Z$ in multivariate regression with $X$ measured using machine learning and model accuracy correlated with $X$ and $Y$ and error is differential. Only multiple imputation and our MLA model with a full specification of the error model obtain consistent estimates of $B_X$.\label{fig:sim1b.z}} +\end{figure} + +\begin{figure}[htbp!] +<>= +#plot.df <- +p <- plot.simulation.dv(plot.df.example.3,'z') +grid.draw(p) +@ +\caption{Estimates of $B_Z$ in \emph{simulation 2a}, multivariate regression with $Y$ measured using an AC that makes errors. Only our MLA model with a full specification of the error model obtains consistent estimates.} +\end{figure} + +\begin{figure}[htbp!] +<>= +#plot.df <- +p <- plot.simulation.dv(plot.df.example.4,'x') +grid.draw(p) +@ +\caption{Estimates of $B_X$ in \emph{simulation 2b} multivariate regression with $Y$ measured using machine learning, model Accuracy correlated with $Z$ and $Y$ and differential error. Only our MLA model with a full specification of the error model obtains consistent estimates. \label{fig:sim2b.z}} +\end{figure} + + \section{Robustness Tests}\label{appendix:robustness} Appendix \ref{appendix:robustness} discusses robustness tests for our simulations. In the following sections, we show what happens when the error model is misspecified (see section \ref{appendix:misspec}), when the accuracy of the classifier varies (see section \ref{appendix:accuracy}), when the classified variable is not balanced but skewed (see section \ref{appendix:imbalanced}), and when the degree of systematic misclassification changes (see section \ref{appendix:degreebias}). -%\subsection{Additional plots for Simulations 1 and 2} -%\label{appendix:main.sim.plots} - -%\begin{figure}[htbp!] -%<>= - -%p <- plot.simulation.iv(plot.df.example.1,iv='z') - -%grid.draw(p) -%@ -%\caption{Estimates of $B_Z$ in \emph{simulation 1a}, multivariate regression with $X$ measured using machine learning and model %accuracy independent of $X$, $Y$, and $Z$. All methods obtain precise and accurate estimates given sufficient validation data.} -%\end{figure} - -%\begin{figure}[htbp!] -%<>= -%p <- plot.simulation.iv(plot.df.example.2, iv='z') -%grid.draw(p) -%@ -%\caption{Estimates of $B_Z$ in multivariate regression with $X$ measured using machine learning and model accuracy correlated with %$X$ and $Y$ and error is differential. Only multiple imputation and our MLA model with a full specification of the error model %obtain consistent estimates of $B_X$.\label{fig:sim1b.z}} -%\end{figure} - -%\begin{figure}[htbp!] -%<>= -%#plot.df <- -%p <- plot.simulation.dv(plot.df.example.3,'z') -%grid.draw(p) -%@ -%\caption{Estimates of $B_Z$ in \emph{simulation 2a}, multivariate regression with $Y$ measured using an AC that makes errors. Only %our MLA model with a full specification of the error model obtains consistent estimates.} -%\end{figure} - -%\begin{figure}[htbp!] -%<>= -%#plot.df <- -%p <- plot.simulation.dv(plot.df.example.4,'x') -%grid.draw(p) -%@ -%\caption{Estimates of $B_X$ in \emph{simulation 2b} multivariate regression with $Y$ measured using machine learning, model %accuracy correlated with $Z$ and $Y$ and differential error. Only our MLA model with a full specification of the error model %obtains consistent estimates. \label{fig:sim2b.z}} -%\end{figure} - %\clearpage \subsection{Robustness Test I: Misspecification of the Error Correction Model} \label{appendix:misspec} @@ -994,11 +995,13 @@ grid.draw(p) source('resources/robustness_check_plots.R') @ -Next, we repeat \emph{Simulation 1a} to show how varying accuracy of the AC affects estimates of independent variables $B_X$ and $B_Z$. Here, we let classifier accuracy range +Next, we repeat \emph{Simulation 1a} and \emph{Simulation 2a} to show how varying accuracy of the AC affects estimates of independent variables $B_X$ and $B_Z$. Here, we let classifier accuracy range from \Sexpr{format.percent(min(robust_2_min_acc))} to \Sexpr{format.percent(max(robust_2_max_acc))}. -In Figure \ref{fig:iv.predacc}, we present results for 5,000 classifications and 100 annotations. +In Figure \ref{fig:iv.predacc}, we present results for \emph{Simulation 1a}, where an independent variable is automatically classified with 5,000 classifications and 100 annotations. As expected, a more accurate classifier causes less misclassification bias. All the error correction methods also provide more precise estimates when used with a more accurate classifiers. +As Figure \ref{fig:dv.predacc} shows, these patterns are similar when the dependent variable is automatically classified as in \emph{Simulation 2a}. + \begin{figure}[htpb!] \begin{subfigure}{0.95\textwidth} <>= @@ -1015,10 +1018,31 @@ grid.draw(p) @ \caption{Estimates of $B_Z$ improve with higher accuracy of the AC.} \end{subfigure} -\caption{Robustness Test II: Varying Accuracy of the Automated Classifier, Simulation 1a} -\label{fig:iv.predacc} +\caption{Robustness Test II: Varying Accuracy of the Automated Classifier, Simulation 2a} +\label{fig:dv.predacc} \end{figure} +\begin{figure}[htpb!] +\begin{subfigure}{0.95\textwidth} +<>= +p <- plot.robustness.2.dv('x') +grid.draw(p) +@ +\caption{Estimates of $B_X$ improve with higher accuracy of the AC.} +\end{subfigure} + +\begin{subfigure}{0.95\textwidth} +<>= +p <- plot.robustness.2.dv('z') +grid.draw(p) +@ +\caption{Estimates of $B_Z$ improve with higher accuracy of the AC.} +\end{subfigure} +\caption{Robustness Test II: Varying Accuracy of the Automated Classifier, Simulation 2a} +\label{fig:dv.predacc} +\end{figure} + + \clearpage \subsection{Robustness Test III: Misclassification in Imbalanced Variables} @@ -1030,7 +1054,6 @@ For simplicity, our main simulations include balanced classified variables. How \label{appendix:imbalanced.iv} Replicating \emph{Simulation 1a}, Figure \ref{fig:iv.imbalanced} illustrates that our MLA method performs similarly well with imbalance in classified independent variables. -%Although the Fischer approximation for confidence intervals performs poorly, the profile likelihood method works well. However, the quality of uncertainty quantification of methods tends to degrade as imbalance increases. This suggests that imbalanced data requires additional validation data for effective misclassification correction. Please note that the PL approach has very large confidence intervals and is thus excluded in Figure \ref{fig:iv.imbalanced} for readability. \begin{figure}[htpb!] @@ -1085,7 +1108,8 @@ Lastly, we explore what happens if misclassification is more or less systematic. \subsubsection{Systematic Misclassification in an Independent Variable} \label{appendix:degreebias.iv} -Replicating \emph{Simulation 1b}, Figure \ref{fig:iv.degreebias} underlines that our MLA method performs well even for higher degrees of systematic misclassification in the independent variable. With fairly high degrees of systematic misclassification, however, estimations of $B_Z$ in particular become inconsistent. +Replicating \emph{Simulation 1b}, Figure \ref{fig:iv.degreebias} underlines that our MLA method performs well even for higher degrees of systematic misclassification in the independent variable. +% With fairly high degrees of systematic misclassification, however, estimations of $B_Z$ in particular become inconsistent. \begin{figure}[htpb!] \begin{subfigure}{0.95\textwidth} @@ -1126,7 +1150,7 @@ grid.draw(p) p <- plot.robustness.4.dv('z') grid.draw(p) @ -\caption{Estimates of $B_Z$ become inconsistent with increasing misclassication in $Y$.} +\caption{Estimates of $B_Z$ become inconsistent with increasing misclassification in $Y$.} \end{subfigure} \caption{Robustness Test IV: Different Degrees of Systematic Misclassification, Simulation 2b} \label{fig:dv.degreebias}