Polish Statistical Association
Central Statistical Office of Poland
Subject: Economics, Statistics & Probability
ISSN: 1234-7655
eISSN: 2450-0291
SEARCH WITHIN CONTENT
Noriah M. Al-Kandari * / Partha Lahiri *
Keywords : binary classification, double sampling, finite population sampling, misclassification, linkage error, sampling design
Citation Information : Statistics in Transition New Series. Volume 17, Issue 3, Pages 429-447, DOI: https://doi.org/10.21307/stattrans-2016-031
License : (CC BY 4.0)
Published Online: 06-July-2017
We consider the problem of predicting a function of misclassified binary variables. We make an interesting observation that the naive predictor, which ignores the misclassification errors, is unbiased even if the total misclassification error is high as long as the probabilities of false positives and false negatives are identical. Other than this case, the bias of the naive predictor depends on the misclassification distribution and the magnitude of the bias can be high in certain cases. We correct the bias of the naive predictor using a double sampling idea where both inaccurate and accurate measurements are taken on the binary variable for all the units of a sample drawn from the original data using a probability sampling scheme. Using this additional information and design-based sample survey theory, we derive a biascorrected predictor. We examine the cases where the new bias-corrected predictors can also improve over the naive predictor in terms of mean square error (MSE).