Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Handling Missing Data via Max-Entropy Regularized Graph Autoencoder
Authors: Ziqi Gao, Yifan Niu, Jiashun Cheng, Jianheng Tang, Lanqing Li, Tingyang Xu, Peilin Zhao, Fugee Tsung, Jia Li
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that MEGAE outperforms all the other state-of-the-art imputation methods on a variety of benchmark datasets. |
| Researcher Affiliation | Collaboration | Ziqi Gao2, Yifan Niu1, Jiashun Cheng2, Jianheng Tang2, Lanqing Li3*, Tingyang Xu3, Peilin Zhao3, Fugee Tsung1,2, Jia Li1,2 1The Hong Kong University of Science and Technology (Guangzhou) 2The Hong Kong University of Science and Technology 3AI Lab, Tencent |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | No statement or link provided regarding open-source code for the described methodology. |
| Open Datasets | Yes | We conduct experiments on 6 benchmark datasets (Morris et al. 2020) from different domains: (1) bioinformatics, i.e., PROTEINS_full (Borgwardt et al. 2005) and ENZYMES (Schomburg et al. 2004); (2) chemistry, i.e., QM9 (Ramakrishnan et al. 2014) and FIRSTMM_DB (Neumann et al. 2013); (3) computer vision, i.e., FRANKENSTEIN (Orsini, Frasconi, and De Raedt 2015); (4) synthesis, i.e., Synthie (Morris et al. 2016). |
| Dataset Splits | Yes | We use a 70-10-20 train-validation-test split and construct random missingness only on the test set. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU/CPU models, cloud instances) used for the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., library or solver names with specific versions). |
| Experiment Setup | No | The paper states 'For all baselines, we use a 2-layer GCN for downstream classification' and mentions a '70-10-20 train-validation-test split' and 'After running for 5 trials', but it does not provide specific hyperparameters like learning rate, batch size, number of epochs, or optimizer settings in the main text. |