Hiding Data Helps: On the Benefits of Masking for Sparse Coding
Authors: Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical results with experiments across several parameter regimes showing that our proposed objective also enjoys better empirical performance than the standard reconstruction objective. ... In this section, we examine whether the separation between the performance of sparse coding with or without masking (demonstrated by Theorems 3.2 and 3.6) manifests in practice. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Duke University 2Department of Computer Science, Brown University. Correspondence to: Muthu Chidambaram <muthu@cs.duke.edu>. |
| Pseudocode | Yes | Algorithm 2 Algorithm for Optimizing L; Algorithm 3 Algorithm for Optimizing Lmask |
| Open Source Code | Yes | Code for the experiments in this section can be found at: https://github.com/2014mchidamb/masked-sparse-coding-icml. |
| Open Datasets | No | We generate a dataset of n = 1000 samples yi = Azi +ϵi, where A Rd p is a standard Gaussian ensemble with normalized columns, the zi have uniformly random k-sparse supports whose entries are i.i.d. N(0, 1), and the ϵi are mean zero Gaussian noise with some fixed variance (which we will vary in our experiments). |
| Dataset Splits | No | The paper mentions 'a held-out set of p samples from the data-generating process for initializing the dictionary B(0)' but does not specify formal training/validation/test splits with percentages, counts, or citations to predefined splits for reproducibility. |
| Hardware Specification | Yes | Our implementation is in Py Torch (Paszke et al., 2019), and all of our experiments were conducted on a single P100 GPU. |
| Software Dependencies | No | Our implementation is in Py Torch (Paszke et al., 2019)... The paper mentions PyTorch but does not specify a version number or other software dependencies with versions. |
| Experiment Setup | Yes | For training, we use batch versions of Algorithms 2 and 3 in which we perform gradient updates with respect to the mean losses computed over {y1, ..., y B} with B = 200 as the batch size. For the actual gradient step, we use Adam (Kingma & Ba, 2014) with its default hyperparameters of β1 = 0.9, β2 = 0.999 and a learning rate of η = 0.001... We train for 500 epochs (passes over the entire dataset) for both Algorithms 2 and 3. For Algorithm 3, we always use a mask size of d d/10. |