Improving the imputation of missing data with Markov Blanket discovery

Authors: Yang Liu, Anthony Constantinou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across different case studies and multiple imputation algorithms show that the proposed solution improves imputation accuracy, both under random and systematic missingness. We evaluate the effectiveness of MBMF on both synthetic and real-world data sets. The empirical experiments show that MBMF outperforms MF, and other relevant state-of-the-art imputation algorithms, under most experiments.
Researcher Affiliation Academia Yang Liu, Anthony C. Constantinou Machine Intelligence and Decision Systems (MIn DS) Research Group Queen Mary University of London {yangliu, a.constantinou}@qmul.ac.uk
Pseudocode Yes Algorithm 1 The Grow and Shrink (GS) algorithm with test-wise deletion and Algorithm 2 Markov Blanket-based Feature Selection (MBFS)
Open Source Code Yes The implementation of MBMF, described in this paper, is available at: https://github.com/Enderlogic/Markov-Blanket-based-Feature-Selection.
Open Datasets Yes We first evaluate the algorithms by applying them to synthetic data sampled from three BNs, ECOLI70, MAGIC-IRRI and ARTH150, taken from the bnlearn repository (Scutari, 2010). We repeat the evaluation by applying the imputation algorithms to six real-world data sets retrieved from the UCI data repository (Dua & Graff, 2017). URL http://archive.ics.uci.edu/ml.
Dataset Splits No The paper describes generating complete and incomplete datasets and evaluating imputation accuracy against complete data, but it does not specify explicit training, validation, or test dataset splits (e.g., in percentages or sample counts) for model training or evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or cloud instance types.
Software Dependencies No The paper mentions software packages used for experiments (e.g., scikit-learn, Miss Forest R package, soft Impute R package) but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes MBMF is applied to continuous data using the Pearson s correlation test for CI tests, and to categorical data using the G-test statistic, both of which are the default choices for GS. We also consider the default threshold for independence, which is 0.1 for CI p-value tests. The other algorithms are also tested with their default hyper-parameters as implemented in their corresponding packages listed above.