reproducibilityindex.ai

HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Authors: Daniel Jarrett, Bogdan C Cebere, Tennison Liu, Alicia Curth, Mihaela van der Schaar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks.
Researcher Affiliation	Academia	1Department of Applied Mathematics & Theoretical Physics, University of Cambridge, UK. 2Department of Electrical Engineering, University of California, Los Angeles, USA.
Pseudocode	Yes	Algorithm 1: Hyper Impute Parameters: Global set of models & hyperparameters A, Model Search function, Baseline Impute function, Imputation stop criterion γ, Selection skip criterion σ, Column visitation order π Input: Incomplete dataset D := {( Xn, Mn)}N n=1 Output: Imputed dataset ˆD := { ˆXn}N n=1 Initialize: ˆD Baseline Impute(D) while γ is False do keep imputing? for column d visitation order π do ˆD obs d := { ˆXn d}n:Mn d =1 (5) Dobs d := { ˆ Xn d }n:Mn d =1 (6) if σ is False then keep selecting? ad Model Search(Dobs d , ˆD obs d, A) hd ad.train(Dobs d , ˆD obs d, Hd) ˆD mis d := { ˆXn d}n:Mn d =0 (7) ˆDmis d := { ˆ Xn d }n:Mn d =0 hd.impute( ˆD mis d)(8)
Open Source Code	Yes	https://github.com/vanderschaarlab/hyperimpute.
Open Datasets	Yes	We employ 12 real-world datasets from the UCI machine learning repository [72]
Dataset Splits	No	The paper describes simulating missingness on datasets for evaluation but does not specify a global train/validation/test split for the datasets used in its main experiments. It uses cross-validation internally for model selection within the imputation process.
Hardware Specification	Yes	Laptop hardware: 32GB RAM, Intel Core i7-6700HQ, Ge Force GTX 950M.
Software Dependencies	No	The paper mentions software packages like 'sklearn', 'xgboost', 'catboost', and 'pytorch' but does not specify their version numbers.
Experiment Setup	Yes	In Table Table 5, we present the full conﬁguration space (models and associated hyperparameter ranges) we consider for the column-wise model selection within Hyper Impute.