ReMasker: Imputing Tabular Data with Masked Autoencoding

Authors: Tianyu Du, Luca Melis, Ting Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With extensive evaluation on 12 benchmark datasets under various missingness scenarios, we show that REMASKER performs on par with or outperforms 13 popular methods in terms of both imputation fidelity and utility, while its performance advantage often increases with the ratio of missing data. We further explore the theoretical explanation for its effectiveness.
Researcher Affiliation Collaboration Tianyu Du1 Luca Melis2 Ting Wang3,4 1Zhejiang University 2Meta 3Penn State 4Stony Brook University
Pseudocode Yes Algorithm 1 REMASKER
Open Source Code Yes (code available: https://github .com/alps-lab/remasker)
Open Datasets Yes For reproducibility and comparability, similar to the prior work (Yoon et al., 2019; Jarrett et al., 2022), we use 12 real-world datasets from the UCI Machine Learning repository (Dua & Graff, 2017) with their characteristics deferred to A.1.
Dataset Splits No The paper refers to 'training epochs' and 'training regime' but does not specify a validation set split or how it was used for model selection. It mentions splitting data for generalization testing ('We halve each dataset into two subsets D and D') but this is not a typical train/validation/test split for model tuning.
Hardware Specification No The paper does not mention any specific hardware used for the experiments, such as GPU models, CPU types, or cloud computing instances with their specifications.
Software Dependencies No The paper mentions software components like 'Transformer', 'MLP layer', and 'Adam optimizer' but does not specify version numbers for these or for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The default parameter setting of REMASKER is listed in Table 7. model parameter setting optimizer Adam initial learning rate 1e-3 LR scheduler cosine annealing gradient clipping threshold 5.0 training epochs 600 batch size 64 masking ratio 0.5 Transformer block 8 embedding width 64 number of heads 4