Improving the imputation of missing data with Markov Blanket discovery
Authors: Yang Liu, Anthony Constantinou
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across different case studies and multiple imputation algorithms show that the proposed solution improves imputation accuracy, both under random and systematic missingness. We evaluate the effectiveness of MBMF on both synthetic and real-world data sets. The empirical experiments show that MBMF outperforms MF, and other relevant state-of-the-art imputation algorithms, under most experiments. |
| Researcher Affiliation | Academia | Yang Liu, Anthony C. Constantinou Machine Intelligence and Decision Systems (MIn DS) Research Group Queen Mary University of London {yangliu, a.constantinou}@qmul.ac.uk |
| Pseudocode | Yes | Algorithm 1 The Grow and Shrink (GS) algorithm with test-wise deletion and Algorithm 2 Markov Blanket-based Feature Selection (MBFS) |
| Open Source Code | Yes | The implementation of MBMF, described in this paper, is available at: https://github.com/Enderlogic/Markov-Blanket-based-Feature-Selection. |
| Open Datasets | Yes | We first evaluate the algorithms by applying them to synthetic data sampled from three BNs, ECOLI70, MAGIC-IRRI and ARTH150, taken from the bnlearn repository (Scutari, 2010). We repeat the evaluation by applying the imputation algorithms to six real-world data sets retrieved from the UCI data repository (Dua & Graff, 2017). URL http://archive.ics.uci.edu/ml. |
| Dataset Splits | No | The paper describes generating complete and incomplete datasets and evaluating imputation accuracy against complete data, but it does not specify explicit training, validation, or test dataset splits (e.g., in percentages or sample counts) for model training or evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or cloud instance types. |
| Software Dependencies | No | The paper mentions software packages used for experiments (e.g., scikit-learn, Miss Forest R package, soft Impute R package) but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | MBMF is applied to continuous data using the Pearson s correlation test for CI tests, and to categorical data using the G-test statistic, both of which are the default choices for GS. We also consider the default threshold for independence, which is 0.1 for CI p-value tests. The other algorithms are also tested with their default hyper-parameters as implemented in their corresponding packages listed above. |