Update on Overleaf.
This commit is contained in:
33
article.Rtex
33
article.Rtex
@@ -537,13 +537,8 @@ In cases similar to \emph{Simulation 1a}, we therefore recommend both GMM and ML
|
|||||||
|
|
||||||
\subsection{\emph{Simulation 1b:} Systematic Misclassification of an Independent Variable}
|
\subsection{\emph{Simulation 1b:} Systematic Misclassification of an Independent Variable}
|
||||||
|
|
||||||
Figure \ref{fig:sim1b.x} illustrates \emph{Simulation 1b}. Here, systematic misclassification gives rise to differential error and creates more extreme misclassification bias that is more difficult to correct.
|
|
||||||
As Figure \ref{fig:sim1b.x} shows, the naïve estimator is opposite in sign to the true parameter.
|
|
||||||
Of the four methods we test, only the MLA and the MI approach provide consistent estimates. This is expected because they use $Y$ to adjust for misclassifications. The bottom row of Figure \ref{fig:sim1b.x} shows how the precision of the MI and MLA estimates increase with additional observations. As in \emph{Simulation 1a}, MLA uses this data more efficiently than MI does. However, due to the low accuracy and bias of the AC, additional unlabeled data improves precision less than one might expect. Both methods provide acceptably accurate confidence intervals. Figure \ref{fig:sim1b.z} in Appendix \ref{appendix:main.sim.plots} shows that, as in \emph{Simulation 1a}, effective correction for misclassifications of $X$ is required to consistently estimate $B_Z$, the coefficient of $Z$ on $Y$. Inspecting results from methods that do not correct for differential error is useful for understanding their limitations. When few annotations of $X$ are observed, GMM is nearly as bad as the naïve estimator. PL is also visibly biased. Both improve when a greater proportion of the data is labeled since they combine AC-based estimates with the feasible estimator.
|
|
||||||
|
|
||||||
In sum, our simulations suggest that the MLA approach is superior in conditions of differential error. Although estimations by the MI approach are consistent, the method's practicality is limited by its inefficiency.
|
\begin{figure}[htbp!]
|
||||||
|
|
||||||
\begin{figure}
|
|
||||||
<<example2.x, echo=FALSE, message=FALSE, warning=FALSE, result='asis', dev='pdf', fig.width=6, fig.asp=.65,cache=F>>=
|
<<example2.x, echo=FALSE, message=FALSE, warning=FALSE, result='asis', dev='pdf', fig.width=6, fig.asp=.65,cache=F>>=
|
||||||
p <- plot.simulation.iv(plot.df.example.2, iv='x')
|
p <- plot.simulation.iv(plot.df.example.2, iv='x')
|
||||||
grid.draw(p)
|
grid.draw(p)
|
||||||
@@ -551,8 +546,23 @@ grid.draw(p)
|
|||||||
\caption{\emph{Simulation 1b:} Systematic misclassification of an independent variable. Only the the MLA approach obtains consistent estimates of $B_X$. \label{fig:sim1b.x}}
|
\caption{\emph{Simulation 1b:} Systematic misclassification of an independent variable. Only the the MLA approach obtains consistent estimates of $B_X$. \label{fig:sim1b.x}}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
|
Figure \ref{fig:sim1b.x} illustrates \emph{Simulation 1b}. Here, systematic misclassification gives rise to differential error and creates more extreme misclassification bias that is more difficult to correct.
|
||||||
|
As Figure \ref{fig:sim1b.x} shows, the naïve estimator is opposite in sign to the true parameter.
|
||||||
|
Of the four methods we test, only the MLA and the MI approach provide consistent estimates. This is expected because they use $Y$ to adjust for misclassifications. The bottom row of Figure \ref{fig:sim1b.x} shows how the precision of the MI and MLA estimates increase with additional observations. As in \emph{Simulation 1a}, MLA uses this data more efficiently than MI does. However, due to the low accuracy and bias of the AC, additional unlabeled data improves precision less than one might expect. Both methods provide acceptably accurate confidence intervals. Figure \ref{fig:sim1b.z} in Appendix \ref{appendix:main.sim.plots} shows that, as in \emph{Simulation 1a}, effective correction for misclassifications of $X$ is required to consistently estimate $B_Z$, the coefficient of $Z$ on $Y$. Inspecting results from methods that do not correct for differential error is useful for understanding their limitations. When few annotations of $X$ are observed, GMM is nearly as bad as the naïve estimator. PL is also visibly biased. Both improve when a greater proportion of the data is labeled since they combine AC-based estimates with the feasible estimator.
|
||||||
|
|
||||||
|
In sum, our simulations suggest that the MLA approach is superior in conditions of differential error. Although estimations by the MI approach are consistent, the method's practicality is limited by its inefficiency.
|
||||||
|
|
||||||
\subsection{\emph{Simulation 2a:} Nonsystematic Misclassification of a Dependent Variable}
|
\subsection{\emph{Simulation 2a:} Nonsystematic Misclassification of a Dependent Variable}
|
||||||
|
|
||||||
|
\begin{figure}[htbp!]
|
||||||
|
<<example3.x, echo=FALSE, message=FALSE, warning=FALSE, result='asis', dev='pdf', fig.width=6, fig.asp=.65,cache=F>>=
|
||||||
|
#plot.df <-
|
||||||
|
p <- plot.simulation.dv(plot.df.example.3,'z')
|
||||||
|
grid.draw(p)
|
||||||
|
@
|
||||||
|
\caption{Simulation 2a: Nonsystematic misclassification of a dependent variable. Only the MLA approach obtains consistent estimates. \label{fig:sim2a.x}}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
Figure \ref{fig:sim2a.x} illustrates \emph{Simulation 2a}: nonsystematic misclassification of a dependent variable. This also introduces bias as evidenced by the naïve estimator's inaccuracy. Our MLA method
|
Figure \ref{fig:sim2a.x} illustrates \emph{Simulation 2a}: nonsystematic misclassification of a dependent variable. This also introduces bias as evidenced by the naïve estimator's inaccuracy. Our MLA method
|
||||||
is able to correct this error and provide consistent estimates.
|
is able to correct this error and provide consistent estimates.
|
||||||
Surprisingly, the MI estimator is inconsistent and does not improve with more human-labeled data.
|
Surprisingly, the MI estimator is inconsistent and does not improve with more human-labeled data.
|
||||||
@@ -562,17 +572,8 @@ It is clear that the precision of the MLA estimator improves with more observati
|
|||||||
When the amount of human-labled data is low, inaccuracies in the 95\% confidence intervals of both the MLA and PL become visible due to the poor finite-sample properties of the quadradic approximation for standard errors.
|
When the amount of human-labled data is low, inaccuracies in the 95\% confidence intervals of both the MLA and PL become visible due to the poor finite-sample properties of the quadradic approximation for standard errors.
|
||||||
%As before, PL's inaccurate confidence intervals are due to its use of finite-sample estimates of automated classification probabilities.
|
%As before, PL's inaccurate confidence intervals are due to its use of finite-sample estimates of automated classification probabilities.
|
||||||
%In both cases, the poor finite-sample properties of the fischer-information quadratic approximation contribute to this inaccuracy. In Appendix \ref{appendix:sim1.profile}, we show that the MLA method's inaccuracy vanishes when using the profile-likelihood method instead.
|
%In both cases, the poor finite-sample properties of the fischer-information quadratic approximation contribute to this inaccuracy. In Appendix \ref{appendix:sim1.profile}, we show that the MLA method's inaccuracy vanishes when using the profile-likelihood method instead.
|
||||||
|
|
||||||
In brief, our simulations suggest that MLA is the best error correction method when random misclassifications affect the dependent variable. It is the only consistent option and more efficient than the PL method, which is almost consistent.
|
|
||||||
|
|
||||||
\begin{figure}
|
In brief, our simulations suggest that MLA is the best error correction method when random misclassifications affect the dependent variable. It is the only consistent option and more efficient than the PL method, which is almost consistent.
|
||||||
<<example3.x, echo=FALSE, message=FALSE, warning=FALSE, result='asis', dev='pdf', fig.width=6, fig.asp=.65,cache=F>>=
|
|
||||||
#plot.df <-
|
|
||||||
p <- plot.simulation.dv(plot.df.example.3,'z')
|
|
||||||
grid.draw(p)
|
|
||||||
@
|
|
||||||
\caption{Simulation 2a: Nonsystematic misclassification of a dependent variable. Only the MLA approach obtains consistent estimates. \label{fig:sim2a.x}}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
\subsection{\emph{Simulation 2b}: Systematic Misclassification of a Dependent Variable}
|
\subsection{\emph{Simulation 2b}: Systematic Misclassification of a Dependent Variable}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user