Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Authors: Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical results with experiments across several parameter regimes showing that our proposed objective also enjoys better empirical performance than the standard reconstruction objective. ... In this section, we examine whether the separation between the performance of sparse coding with or without masking (demonstrated by Theorems 3.2 and 3.6) manifests in practice.
Researcher Affiliation Academia 1Department of Computer Science, Duke University 2Department of Computer Science, Brown University. Correspondence to: Muthu Chidambaram <muthu@cs.duke.edu>.
Pseudocode Yes Algorithm 2 Algorithm for Optimizing L; Algorithm 3 Algorithm for Optimizing Lmask
Open Source Code Yes Code for the experiments in this section can be found at: https://github.com/2014mchidamb/masked-sparse-coding-icml.
Open Datasets No We generate a dataset of n = 1000 samples yi = Azi +ϵi, where A Rd p is a standard Gaussian ensemble with normalized columns, the zi have uniformly random k-sparse supports whose entries are i.i.d. N(0, 1), and the ϵi are mean zero Gaussian noise with some fixed variance (which we will vary in our experiments).
Dataset Splits No The paper mentions 'a held-out set of p samples from the data-generating process for initializing the dictionary B(0)' but does not specify formal training/validation/test splits with percentages, counts, or citations to predefined splits for reproducibility.
Hardware Specification Yes Our implementation is in Py Torch (Paszke et al., 2019), and all of our experiments were conducted on a single P100 GPU.
Software Dependencies No Our implementation is in Py Torch (Paszke et al., 2019)... The paper mentions PyTorch but does not specify a version number or other software dependencies with versions.
Experiment Setup Yes For training, we use batch versions of Algorithms 2 and 3 in which we perform gradient updates with respect to the mean losses computed over {y1, ..., y B} with B = 200 as the batch size. For the actual gradient step, we use Adam (Kingma & Ba, 2014) with its default hyperparameters of β1 = 0.9, β2 = 0.999 and a learning rate of η = 0.001... We train for 500 epochs (passes over the entire dataset) for both Algorithms 2 and 3. For Algorithm 3, we always use a mask size of d d/10.