How to Deal with Missing Categorical Data: Test of a Simple Bayesian Method

T. B. ASTEBRO, G. Chen

Organizational Research Methods

July 2003, vol. 6, n°3, pp.309-327

Departments: Economics & Decision Sciences, GREGHEC (CNRS)

Keywords: Missing data, Categorical variable, Bayesian, Imputation

The authors analyze the efficiency of six missing data techniques for categorical item nonresponse under the assumption that data are missing at random or missing completely at random. By efficiency, the authors mean a procedure that produces an unbiased estimate of true sample properties that is also easy to implement. The investigated techniques include listwise deletion, mode substitution, random imputation, two regression imputations, and a Bayesian model-based procedure. The authors analyze efficiency under six experimental conditions for a survey-based data set. They find that listwise deletion is efficient for the data analyzed. If data loss due to listwise deletion is an issue, the analysis points to the Bayesian method. Regression imputation is also efficient, but the result is conditioned on the specific data structure and may not hold in general. Additional problems arise when using regression imputation, making it less appropriate.