Stabilizing Equilibrium Models by Jacobian Regularization
Authors: Shaojie Bai, Vladlen Koltun, Zico Kolter
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed regularization by experiments on both toy-scale synthetic tasks and large-scale real datasets across domains: word-level language modeling on Wiki Text-103 (Merity et al., 2017) and high-resolutional image classification on the full Image Net dataset (Deng et al., 2009). |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, Pittsburgh PA, USA 2Intel Labs, USA. Correspondence to: Shaojie Bai <shaojieb@cs.cmu.edu>. |
| Pseudocode | No | The paper describes the model architecture and processes in prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available here. |
| Open Datasets | Yes | We validate the proposed regularization by experiments on both toy-scale synthetic tasks and large-scale real datasets across domains: word-level language modeling on Wiki Text-103 (Merity et al., 2017) and high-resolutional image classification on the full Image Net dataset (Deng et al., 2009). |
| Dataset Splits | Yes | We generated 5096 scalar data pairs (x, y) using function y = h(x) = 3 2x3 + x2 5x + 2 sin(x) 3 + δ (where δ N(0, 0.05)), and split them into 4096 and 1000 training and validation samples, respectively. |
| Hardware Specification | No | The paper states, 'The memory and speeds reported are benchmarked across different models on the same setting (e.g., same batch size, sequence length, number of steps, etc.) with the same GPU,' but does not specify the model or detailed specifications of the GPU or any other hardware. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | As we found the Jacoabian regularization could sometimes hurt performance (see Sec. 5.3), we only apply the proposed loss stochastically with a probability p, and gradually increase this p or the regularization strength γ (see Eq. (4)) over training steps. We also use cosine learning rate schedule (Loshchilov & Hutter, 2017) for all tasks, including the synthetic one. |