reproducibilityindex.ai

Neural Deep Equilibrium Solvers

Authors: Shaojie Bai, Vladlen Koltun, J Zico Kolter

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that these neural equilibrium solvers are fast to train (only taking an extra 0.9-1.1% over the original DEQ s training time), require few additional parameters (1-3% of the original model size), yet lead to a 2 speedup in DEQ network inference without any degradation in accuracy across numerous domains and tasks.
Researcher Affiliation	Collaboration	Shaojie Bai Carnegie Mellon University Vladlen Koltun Apple J. Zico Kolter Carnegie Mellon University and Bosch Center for AI
Pseudocode	Yes	Algorithm 1 Anderson acceleration (AA) prototype (with parameter β and m)
Open Source Code	Yes	Code is available at https://github.com/locuslab/deq.
Open Datasets	Yes	To evaluate the the neural deep equilibrium solvers, we apply them on three largest-scale and highest-dimensional tasks the implicit models have ever been applied on, across the vision and language modalities. ... Wiki Text-103 language modeling (Merity et al., 2017), Image Net classiﬁcation (Deng et al., 2009), and Cityscapes semantic segmentation with megapixel images (Cordts et al., 2016).
Dataset Splits	Yes	Wikitext-103 corpus contains over 103M words in its training split, and 218K/246K words for validation/test.
Hardware Specification	Yes	All of our experiments were conducted on NVIDIA RTX 2080 Ti GPUs.
Software Dependencies	No	The paper mentions using “Adam optimizer” and building upon “DEQ repo” and “MDEQ repo” but does not specify software versions for programming languages, libraries, or frameworks like PyTorch or CUDA.
Experiment Setup	Yes	Note that our approach only introduces minimal new hyperparameters (as the original DEQ model parameters are frozen). For the language modeling task, we use Adam optimizer (Kingma & Ba, 2015) with start learning rate 0.001 and cosine learning rate annealing (Loshchilov & Hutter, 2017). The neural solver is trained for 5000 steps, with sequences of length 60 and batch size 10, on top of a pretrained DEQ with word embedding dimension 700.