reproducibilityindex.ai

Transformed Distribution Matching for Missing Value Imputation

Authors: He Zhao, Ke Sun, Amir Dezfouli, Edwin V. Bonilla

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.
Researcher Affiliation	Academia	1CSIRO s Data61, Australia. Correspondence to: He Zhao <he.zhao@ieee.org>.
Pseudocode	Yes	Algorithm 1: TDM. Learnable parameters include missing values X[M] and parameters θ of f. input :Data X with missing values indicated by M output :X with X[M] imputed, fθ Initialise θ of f; # Initialise missing values with noisy mean # X[M] nanmean(X, dim=0) + N(0, 0.1) ; while Not converged do Sample two batches of B data samples X1 and X2; Feed X1 and X2 to fθ; for i = 1 . . . B, j = 1 . . . K do Compute G [i, j]; # Quadratic cost function # end Compute LW ; Update the missing values X1 2[M 1 2] and θ with gradient update; end
Open Source Code	Yes	Code at https://github.com/hezgit/TDM
Open Datasets	Yes	UCI datasets7 with different sizes are used in the experiments, the statistics of which are shown in Table 1 of the appendix.
Dataset Splits	Yes	We report the average accuracy of 5-fold cross-validations.
Hardware Specification	No	The paper mentions 'computing environment' when discussing running time but does not specify any particular hardware (e.g., GPU, CPU models, or memory) used for experiments.
Software Dependencies	No	The paper mentions 'Python/Numpy style matrix indexing', 'sklearn', and 'POT package' but does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	To minimise the loss in Eq. (8), we use RMSprop (Tieleman et al., 2012) as the optimiser with learning rate of 10 2 and batch size of 51210. ... The first one is the number of INN blocks T. ... We empirically find that T = 3 and K = 2 work well in practice ... We train our method for 10,000 iterations and report the performance based on the last iteration, which is the same for all the OT-based methods.