HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Authors: Daniel Jarrett, Bogdan C Cebere, Tennison Liu, Alicia Curth, Mihaela van der Schaar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks.
Researcher Affiliation Academia 1Department of Applied Mathematics & Theoretical Physics, University of Cambridge, UK. 2Department of Electrical Engineering, University of California, Los Angeles, USA.
Pseudocode Yes Algorithm 1: Hyper Impute Parameters: Global set of models & hyperparameters A, Model Search function, Baseline Impute function, Imputation stop criterion γ, Selection skip criterion σ, Column visitation order π Input: Incomplete dataset D := {( Xn, Mn)}N n=1 Output: Imputed dataset ˆD := { ˆXn}N n=1 Initialize: ˆD Baseline Impute(D) while γ is False do keep imputing? for column d visitation order π do ˆD obs d := { ˆXn d}n:Mn d =1 (5) Dobs d := { ˆ Xn d }n:Mn d =1 (6) if σ is False then keep selecting? ad Model Search(Dobs d , ˆD obs d, A) hd ad.train(Dobs d , ˆD obs d, Hd) ˆD mis d := { ˆXn d}n:Mn d =0 (7) ˆDmis d := { ˆ Xn d }n:Mn d =0 hd.impute( ˆD mis d)(8)
Open Source Code Yes https://github.com/vanderschaarlab/hyperimpute.
Open Datasets Yes We employ 12 real-world datasets from the UCI machine learning repository [72]
Dataset Splits No The paper describes simulating missingness on datasets for evaluation but does not specify a global train/validation/test split for the datasets used in its main experiments. It uses cross-validation internally for model selection within the imputation process.
Hardware Specification Yes Laptop hardware: 32GB RAM, Intel Core i7-6700HQ, Ge Force GTX 950M.
Software Dependencies No The paper mentions software packages like 'sklearn', 'xgboost', 'catboost', and 'pytorch' but does not specify their version numbers.
Experiment Setup Yes In Table Table 5, we present the full configuration space (models and associated hyperparameter ranges) we consider for the column-wise model selection within Hyper Impute.