reproducibilityindex.ai

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Authors: Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We corroborate our theoretical results with experiments across several parameter regimes showing that our proposed objective also enjoys better empirical performance than the standard reconstruction objective. ... In this section, we examine whether the separation between the performance of sparse coding with or without masking (demonstrated by Theorems 3.2 and 3.6) manifests in practice.
Researcher Affiliation	Academia	1Department of Computer Science, Duke University 2Department of Computer Science, Brown University. Correspondence to: Muthu Chidambaram <muthu@cs.duke.edu>.
Pseudocode	Yes	Algorithm 2 Algorithm for Optimizing L; Algorithm 3 Algorithm for Optimizing Lmask
Open Source Code	Yes	Code for the experiments in this section can be found at: https://github.com/2014mchidamb/masked-sparse-coding-icml.
Open Datasets	No	We generate a dataset of n = 1000 samples yi = Azi +ϵi, where A Rd p is a standard Gaussian ensemble with normalized columns, the zi have uniformly random k-sparse supports whose entries are i.i.d. N(0, 1), and the ϵi are mean zero Gaussian noise with some fixed variance (which we will vary in our experiments).
Dataset Splits	No	The paper mentions 'a held-out set of p samples from the data-generating process for initializing the dictionary B(0)' but does not specify formal training/validation/test splits with percentages, counts, or citations to predefined splits for reproducibility.
Hardware Specification	Yes	Our implementation is in Py Torch (Paszke et al., 2019), and all of our experiments were conducted on a single P100 GPU.
Software Dependencies	No	Our implementation is in Py Torch (Paszke et al., 2019)... The paper mentions PyTorch but does not specify a version number or other software dependencies with versions.
Experiment Setup	Yes	For training, we use batch versions of Algorithms 2 and 3 in which we perform gradient updates with respect to the mean losses computed over {y1, ..., y B} with B = 200 as the batch size. For the actual gradient step, we use Adam (Kingma & Ba, 2014) with its default hyperparameters of β1 = 0.9, β2 = 0.999 and a learning rate of η = 0.001... We train for 500 epochs (passes over the entire dataset) for both Algorithms 2 and 3. For Algorithm 3, we always use a mask size of d d/10.