Sparse Coding with Gated Learned ISTA

Authors: Kailun Wu, Yiwen Guo, Ziang Li, Changshui Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical results confirm our theoretical findings and verify the effectiveness of our method.
Researcher Affiliation Collaboration Kailun Wu Department of Automation, Tsinghua University Beijing, P.R.China wukl14@mails.tsinghua.edu.cn &kailun.wukailun@alibaba-inc.com Yiwen Guo Bytedance AI Lab Beijing, P.R.China guoyiwen.ai@bytedance.com Ziang Li Department of Automation, Tsinghua University Beijing, P.R.China liza19@mails.tsinghua.edu.cn Changshui Zhang Department of Automation, Tsinghua University Beijing, P.R.China zcs@mail.tsinghua.edu.cn Kailun Wu, Ziang Li, and Changshui Zhang are with the Institute for Artificial Intelligence (THUAI), the State Key Lab of Intelligent Technologies and Systems, the Beijing National Research Center for Information Science and Technology (BNRist), and the Department of Automation, Tsinghua University. Kailun Wu did his work when he was a Ph.D. student at Tsinghua Unversity, and now he works at Alibaba Group.
Pseudocode Yes Algorithm 1 ISTA with adaptive overshoot.
Open Source Code Yes The core codes of this paper could be find in github https://github.com/wukailun/GLISTA/.
Open Datasets No The paper describes generating synthetic data and references prior works for the process ('We randomly synthesize in-stream xs and ε to obtain y for training, and we let two extra sets consisting of 1000 samples each as the validation and test sets, just like in prior works1. (Chen et al., 2018; Liu et al., 2019; Borgerding et al., 2017)'). However, it does not provide concrete access information (link, DOI, specific repository name, or citation to a pre-existing dataset) for a publicly available or open dataset.
Dataset Splits Yes We randomly synthesize in-stream xs and ε to obtain y for training, and we let two extra sets consisting of 1000 samples each as the validation and test sets, just like in prior works1.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU/GPU models, memory amounts, or detailed computer specifications) used for running the experiments.
Software Dependencies No The paper mentions using Adam as an optimizer ('We use Adam (Cho et al., 2014) and let β1 = 0.9 and β2 = 0.999') and refers to training followings Chen et al. s (2018), but it does not specify software dependencies with version numbers (e.g., specific Python, PyTorch, or TensorFlow versions).
Experiment Setup Yes For the proposed gated LISTA and other deep learning-based methods, we set d = 16 and let {b(t)} not be shared between different layers under all circumstances. The weight matrices {W (t), U (t)} are not shared either in our method and the coupled constraints W (t) = I U (t)A, t, are imposed. For all gates, νt is initialized as 1.0, and then we let the initial value of µt in the inverse proportional function powered gain gate be 1.0 too, since Eq. (12) and (13) indicate 0 µt 2. Other learnable parameters in our gates are uniformly initialized as 5.0 according to their suggested range of the gates. The training batch size is 64. We use Adam (Cho et al., 2014) and let β1 = 0.9 and β2 = 0.999. The hyper-parameters are tuned on the validation set and fixed for all our experiments in the sequel. Our training follows it of Chen et al. s (2018). That said, the sparse coding network is trained progressively to update more layers, and we cut the learning rate for currently optimized layers when no decrease in the validation loss can be observed for 4000 iterations, with a base learning rate of 0.0005. Training on current layers stops when the validation loss does not decrease any more with the learning rate being cut to 0.00001.