DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations
Authors: Zuowen Wang, Longbiao Cheng, Pehuen Moure, Niklas Hahn, Shih-Chii Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verified our findings and reached 84% FLOPs reduction on the implicit neural representation task, 73% on the Sintel and 76% on the KITTI datasets for the optical flow estimation task while keeping comparable task accuracy with the models that perform the full update. |
| Researcher Affiliation | Academia | Institute of Neuroinformatics, University of Zurich and ETH Zurich |
| Pseudocode | Yes | Algorithm 1: An convolution layer with delta rule Eq. 17 |
| Open Source Code | Yes | The code is available at https://github.com/Zuowen Wang0000/Delta-Deep-Equilibrium-Models. |
| Open Datasets | Yes | The models were all pretrained on the Flying Chair [16] and Flying Things3D [40] datasets and tested on the training splits of Sintel [8] and KITTI [24] datasets |
| Dataset Splits | No | No explicit mention of specific validation dataset splits with percentages or sample counts was found. |
| Hardware Specification | Yes | All experiments were conducted on an Nvidia RTX 3090 GPU with 24GB of RAM and Intel(R) Xeon(R) W-2195 CPU @ 2.30GHz. |
| Software Dependencies | No | The paper mentions using Adam optimizer and cosine annealing learning rate schedule, but does not specify software dependencies with version numbers (e.g., PyTorch version, Python version, specific library versions). |
| Experiment Setup | Yes | We use Adam [34] optimizer and cosine annealing learning rate schedule [38] with an initial learning rate of 0.001. The hidden state size is 512. For global early stopping, we use absolute distance with a tolerance of 0.001 and a maximum of 40 forward iterations. |