reproducibilityindex.ai

Stabilizing Equilibrium Models by Jacobian Regularization

Authors: Shaojie Bai, Vladlen Koltun, Zico Kolter

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the proposed regularization by experiments on both toy-scale synthetic tasks and large-scale real datasets across domains: word-level language modeling on Wiki Text-103 (Merity et al., 2017) and high-resolutional image classiﬁcation on the full Image Net dataset (Deng et al., 2009).
Researcher Affiliation	Collaboration	1Carnegie Mellon University, Pittsburgh PA, USA 2Intel Labs, USA. Correspondence to: Shaojie Bai <shaojieb@cs.cmu.edu>.
Pseudocode	No	The paper describes the model architecture and processes in prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available here.
Open Datasets	Yes	We validate the proposed regularization by experiments on both toy-scale synthetic tasks and large-scale real datasets across domains: word-level language modeling on Wiki Text-103 (Merity et al., 2017) and high-resolutional image classiﬁcation on the full Image Net dataset (Deng et al., 2009).
Dataset Splits	Yes	We generated 5096 scalar data pairs (x, y) using function y = h(x) = 3 2x3 + x2 5x + 2 sin(x) 3 + δ (where δ N(0, 0.05)), and split them into 4096 and 1000 training and validation samples, respectively.
Hardware Specification	No	The paper states, 'The memory and speeds reported are benchmarked across different models on the same setting (e.g., same batch size, sequence length, number of steps, etc.) with the same GPU,' but does not specify the model or detailed specifications of the GPU or any other hardware.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	As we found the Jacoabian regularization could sometimes hurt performance (see Sec. 5.3), we only apply the proposed loss stochastically with a probability p, and gradually increase this p or the regularization strength γ (see Eq. (4)) over training steps. We also use cosine learning rate schedule (Loshchilov & Hutter, 2017) for all tasks, including the synthetic one.