DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations

Authors: Zuowen Wang, Longbiao Cheng, Pehuen Moure, Niklas Hahn, Shih-Chii Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verified our findings and reached 84% FLOPs reduction on the implicit neural representation task, 73% on the Sintel and 76% on the KITTI datasets for the optical flow estimation task while keeping comparable task accuracy with the models that perform the full update.
Researcher Affiliation Academia Institute of Neuroinformatics, University of Zurich and ETH Zurich
Pseudocode Yes Algorithm 1: An convolution layer with delta rule Eq. 17
Open Source Code Yes The code is available at https://github.com/Zuowen Wang0000/Delta-Deep-Equilibrium-Models.
Open Datasets Yes The models were all pretrained on the Flying Chair [16] and Flying Things3D [40] datasets and tested on the training splits of Sintel [8] and KITTI [24] datasets
Dataset Splits No No explicit mention of specific validation dataset splits with percentages or sample counts was found.
Hardware Specification Yes All experiments were conducted on an Nvidia RTX 3090 GPU with 24GB of RAM and Intel(R) Xeon(R) W-2195 CPU @ 2.30GHz.
Software Dependencies No The paper mentions using Adam optimizer and cosine annealing learning rate schedule, but does not specify software dependencies with version numbers (e.g., PyTorch version, Python version, specific library versions).
Experiment Setup Yes We use Adam [34] optimizer and cosine annealing learning rate schedule [38] with an initial learning rate of 0.001. The hidden state size is 512. For global early stopping, we use absolute distance with a tolerance of 0.001 and a maximum of 40 forward iterations.