[Home ] [Archive]   [ فارسی ]  
:: Main :: About :: Current Issue :: Archive :: Search :: Submit :: Contact ::
Main Menu
Home::
Journal Information::
Articles archive::
For Authors::
For Reviewers::
Registration::
Ethics Considerations::
Contact us::
Site Facilities::
::
Search in website

Advanced Search
..
Receive site information
Enter your Email in the following box to receive the site news and information.
..
Indexing and Abstracting



 
..
Social Media

..
Licenses
Creative Commons License
This Journal is licensed under a Creative Commons Attribution NonCommercial 4.0
International License
(CC BY-NC 4.0).
 
..
Similarity Check Systems


..
:: ::
Back to the articles list Back to browse issues page
Advanced Missing Value Imputation Techniques: Machine Learning Methods with an Emphasis on an Ensemble Method for Multiple Imputation by Chained Equations
Mehrdad Ghaderi , Zahra Rezaei Ghahroodi * , Mina Gandomi
Abstract:   (471 Views)

Researchers often face the problem of how to address missing data. Multiple imputation by chained equations is one of the most common methods for imputation. In theory, any imputation model can be used to predict the missing values. However, if the predictive models are incorrect, it can lead to biased estimates and invalid inferences. One of the latest solutions for dealing with missing data is machine learning methods and the SuperMICE method. In this paper, We present a set of simulations indicating that this approach produces final parameter estimates with lower bias and better coverage than other commonly used imputation methods. Also, implementing some machine learning methods and an ensemble algorithm, SuperMICE, on the data of the Industrial establishment survey is discussed,  in which the imputation of different variables in the data co-occurs. Also, the evaluation of various methods is discussed, and the method that has better performance than the other methods is introduced.

Keywords: Missing data, Super learner ensemble algorithm, Multiple imputation by chained equations, Machine learning methods, Industrial establishment survey
Full-Text [PDF 6724 kb]   (179 Downloads)    
Type of Study: Applied | Subject: Applied Statistics
Received: 2024/10/13 | Accepted: 2024/08/31
References
1. Aerts, M., Claeskens, G., Hens, N., and Molenberghs, G. (2002). Local Multiple Imputation. Biometrika, 89(2), 375-388. [DOI:10.1093/biomet/89.2.375]
2. Alwateer, M., Atlam, E. S., Abd El-Raouf, M. M., Ghoneim, O. A., and Gad, I. (2024). Missing Data Imputation: A Comprehensive Review. Journal of Computer and Communications, 12(11), 53-75. [DOI:10.4236/jcc.2024.1211004]
3. Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., and Tabona, O. (2021). A Survey on Missing Data in Machine Learning. Journal of Big Data, 8(1), 1-37. [DOI:10.1186/s40537-021-00516-9] [PMID] []
4. Graham, J. W., Olchowski, A. E., and Gilreath, T. D. (2007). How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory. Prevention Science, 8, 206-213. [DOI:10.1007/s11121-007-0070-9] [PMID]
5. Laqueur, H. S., Shev, A. B., and Kagawa, R. M. (2022). SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations. American Journal of Epidemiology, 191(3), 516-525. [DOI:10.1093/aje/kwab271] [PMID]
6. Little, R. J. (1988). Missing-data Adjustments in Large Surveys. Journal of Business and Economic Statistics, 6(3), 287-296. https://doi.org/10.2307/1391881 [DOI:10.1080/07350015.1988.10509663]
7. Marshall, A., Altman, D. G., Royston, P., and Holder, R. L. (2010). Comparison of Techniques for Handling Missing Covariate Data Within Prognostic Modelling Studies: A Simulation Study. BMC Medical Research Methodology, 10(1), 1-16. [DOI:10.1186/1471-2288-10-7] [PMID] []
8. Nadaraya, E. A. (1964). On Estimating Regression. Theory of Probability and Its Applications, 9(1), 141-142. [DOI:10.1137/1109020]
9. Quinlan, J. R. (1987). Simplifying Decision Trees. International Journal of Man-Machine Studies, 27(3), 221-234. [DOI:10.1016/S0020-7373(87)80053-6]
10. Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J.,and Solenberger, P. (2001). A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models. Survey Methodology, 27(1), 85-96.
11. Rubin, D. B. Multiple Imputation for Nonresponse in Surveys.Toronto, ON, Canada: John Wiley and Sons, Inc.; 2004.
12. Stekhoven, D. J., & Bühlmann, P. (2012). MissForest-non-parametric Missing Value Imputation for Mixed-type Data. Bioinformatics, 28(1), 112-118. [DOI:10.1093/bioinformatics/btr597] [PMID]
13. Tiwaskar, S., Rashid, M., and Gokhale, P. (2024). Impact of Machine Learning-based Imputation Techniques on Medical Datasets-a Comparative Analysis, {it Multimedia Tools and Applications}, DOI:10.1007/s11042-024-19103-0. [DOI:10.1007/s11042-024-19103-0]
14. Van Buuren, S. (2018). Flexible Imputation of Missing Data. CRC press. [DOI:10.1201/9780429492259]
15. Van Buuren, S., and Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45, 1-67. [DOI:10.18637/jss.v045.i03]
16. Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super Learner, Statistical Applications in Genetics and Molecular Biology, 6(1), DOI:10.2202/1544-6115.1309. [DOI:10.2202/1544-6115.1309]
Send email to the article author

Add your comments about this article
Your username or Email:

CAPTCHA



XML   Persian Abstract   Print



Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Back to the articles list Back to browse issues page
مجله علوم آماری – نشریه علمی پژوهشی انجمن آمار ایران Journal of Statistical Sciences

Persian site map - English site map - Created in 0.11 seconds with 43 queries by YEKTAWEB 4704