Simultaneous Missing Value Imputation and Structure Learning with Groups

Authors: Pablo Morales-Alvarez, Wenbo Gong, Angus Lamb, Simon Woodhead, Simon Peyton Jones, Nick Pawlowski, Miltiadis Allamanis, Cheng Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we conduct extensive experiments on synthetic, semi-synthetic, and real-world education data sets.
Researcher Affiliation Collaboration Pablo Morales-Alvarez University of Granada Wenbo Gong Microsoft Research Angus Lamb G-Research Simon Woodhead Eedi Simon Peyton Jones Epic Games Nick Pawlowski Microsoft Research Miltiadis Allamanis Google Cheng Zhang Microsoft Research
Pseudocode Yes Algorithm 1 Generative process
Open Source Code Yes We will provide the main model code in the supplemental material. The full running code will be released after acceptance.
Open Datasets Yes We evaluate our method using a benchmark in healthcare applications [53].
Dataset Splits No For each simulated dataset, we simulate 5000 training and 1000 test samples. The train and test sets have 1000 and 500 patients, respectively. There is no explicit mention of a validation set or split percentages for it.
Hardware Specification Yes All experiments were conducted on NVIDIA V100 GPU.
Software Dependencies No The paper mentions general training details like the Adam optimizer with a learning rate of 0.001, but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We use Adam with learning rate of 0.001. We use a batch size of 128. For the synthetic and neuropathic pain dataset, the models are trained for 200 epochs and for the Eedi data, we train for 500 epochs. For the VISL model, the dimension of the latent variable for each group is 1. The GNN in the decoder has 3 layers, and we run 3 message passing steps. The MLPs are 2-layer with 64 hidden units for each layer. The DAG regulariser strength λ is 0.01 for the synthetic dataset and 0.001 for the Neuropathic Pain and Eedi dataset.