HyperImpute: Generalized Iterative Imputation with Automatic Model Selection
Authors: Daniel Jarrett, Bogdan C Cebere, Tennison Liu, Alicia Curth, Mihaela van der Schaar
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks. |
| Researcher Affiliation | Academia | 1Department of Applied Mathematics & Theoretical Physics, University of Cambridge, UK. 2Department of Electrical Engineering, University of California, Los Angeles, USA. |
| Pseudocode | Yes | Algorithm 1: Hyper Impute Parameters: Global set of models & hyperparameters A, Model Search function, Baseline Impute function, Imputation stop criterion γ, Selection skip criterion σ, Column visitation order π Input: Incomplete dataset D := {( Xn, Mn)}N n=1 Output: Imputed dataset ˆD := { ˆXn}N n=1 Initialize: ˆD Baseline Impute(D) while γ is False do keep imputing? for column d visitation order π do ˆD obs d := { ˆXn d}n:Mn d =1 (5) Dobs d := { ˆ Xn d }n:Mn d =1 (6) if σ is False then keep selecting? ad Model Search(Dobs d , ˆD obs d, A) hd ad.train(Dobs d , ˆD obs d, Hd) ˆD mis d := { ˆXn d}n:Mn d =0 (7) ˆDmis d := { ˆ Xn d }n:Mn d =0 hd.impute( ˆD mis d)(8) |
| Open Source Code | Yes | https://github.com/vanderschaarlab/hyperimpute. |
| Open Datasets | Yes | We employ 12 real-world datasets from the UCI machine learning repository [72] |
| Dataset Splits | No | The paper describes simulating missingness on datasets for evaluation but does not specify a global train/validation/test split for the datasets used in its main experiments. It uses cross-validation internally for model selection within the imputation process. |
| Hardware Specification | Yes | Laptop hardware: 32GB RAM, Intel Core i7-6700HQ, Ge Force GTX 950M. |
| Software Dependencies | No | The paper mentions software packages like 'sklearn', 'xgboost', 'catboost', and 'pytorch' but does not specify their version numbers. |
| Experiment Setup | Yes | In Table Table 5, we present the full configuration space (models and associated hyperparameter ranges) we consider for the column-wise model selection within Hyper Impute. |