Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Authors: Ricardo Cardoso Pereira, Miriam Seoane Santos, Pedro Pereira Rodrigues, Pedro Henriques Abreu

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.
Researcher Affiliation	Academia	Ricardo Cardoso Pereira EMAIL Centre for Informatics and Systems of the University of Coimbra (CISUC), Department of Informatics Engineering, University of Coimbra, 3030-790, Coimbra, Portugal Miriam Seoane Santos EMAIL IPO-Porto Research Centre, 4200-072, Porto, Portugal Centre for Informatics and Systems of the University of Coimbra (CISUC), Department of Informatics Engineering, University of Coimbra, 3030-790, Coimbra, Portugal Pedro Pereira Rodrigues EMAIL Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine (MEDCIDS-FMUP), University of Porto, 4200-319, Porto, Portugal Pedro Henriques Abreu EMAIL Centre for Informatics and Systems of the University of Coimbra (CISUC), Department of Informatics Engineering, University of Coimbra, 3030-790, Coimbra, Portugal
Pseudocode	No	The paper describes the theoretical background and architecture of Autoencoders and their variants, as well as extensions and training aspects, but does not present any structured pseudocode or algorithm blocks for its own methodology or the surveyed methods.
Open Source Code	No	This paper is a survey and does not present new methodology with accompanying source code. There is no statement about the release of source code or links to a repository.
Open Datasets	Yes	The vast majority of datasets used are small and medium-sized, mostly below 100000 observations. This may be due, once again, to the computation resources and the time complexity needed to trained AEs (and any deep learning model) in big data scenarios. Therefore, the generalization of these conclusions is limited to small and medium-sized datasets, while further work in big data contexts is required. The remaining works use synthetic and real-world datasets from miscellaneous contexts, the majority available at public repositories such as the UCI Machine Learning.
Dataset Splits	No	The paper describes dataset splits used by the surveyed works, but does not define any dataset splits for its own analysis or experimental reproduction, as it is a survey paper.
Hardware Specification	No	The paper does not provide specific hardware details used for conducting its analysis or generating its results. Discussions about hardware are limited to the general computational demands of Autoencoders and are not related to the authors' experimental setup.
Software Dependencies	No	The paper does not mention any specific software dependencies or version numbers used for its own analysis or research process.
Experiment Setup	No	The paper is a survey and therefore does not have an experimental setup with hyperparameters or training settings for its own work. It discusses such settings and their trends in the literature of the surveyed papers.