On Training Implicit Models
Authors: Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, Zhouchen Lin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 , and even boost the performance over approaches based on the exact gradient on Image Net. We conduct an extensive set of synthetic, ablation, and large-scale experiments to both analyze the theoretical properties of the phantom gradient and validate its speedup and performances on various tasks, such as Image Net [10] classification and Wikitext-103 [13] language modeling. |
| Researcher Affiliation | Academia | Zhengyang Geng1,2 Xin-Yu Zhang2 Shaojie Bai4 Yisen Wang2,3 Zhouchen Lin2,3,5 1Zhejiang Lab, China 2Key Lab. of Machine Perception, School of AI, Peking University 3Institute for Artificial Intelligence, Peking University 4Carnegie Mellon University 5Pazhou Lab, China |
| Pseudocode | No | The paper describes implementation steps in Appendix B but does not provide structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All training sources of this work are available at https://github.com/Gsunshine/phantom_grad. |
| Open Datasets | Yes | We conduct an extensive set of synthetic, ablation, and large-scale experiments to both analyze the theoretical properties of the phantom gradient and validate its speedup and performances on various tasks, such as Image Net [10] classification and Wikitext-103 [13] language modeling. ... on the CIFAR-10 dataset, we use the MDEQ-Tiny model [3] (170K parameters) as the backbone model, denoted as the ablation setting. |
| Dataset Splits | No | The paper mentions using standard datasets like CIFAR-10, ImageNet, and Wikitext-103, which have well-known splits, but it does not explicitly state the train/validation/test split percentages, absolute sample counts, or specific predefined split references with citations within the text for reproduction. |
| Hardware Specification | No | The paper mentions using '8 GPUs' for training on Image Net, but it does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or any other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly provide specific software dependencies, such as programming language versions, library versions (e.g., PyTorch, TensorFlow), or solver versions, that are needed to replicate the experiments. |
| Experiment Setup | Yes | In the synthetic setting, we directly set the Lipschitz constant of F as Lh = 0.9, and use 100 fixed-point iterations to solve the root h of Eq. (1) until the relative error satisfies h F(h, z) / h < 10 5. ... The MDEQ model employs a 10-layer unrolling for pretraining... We use the MDEQ-Tiny model [3]... Adam SGD... UPG (A5,0.5), NPG (A5,0.5)... UPG (A9,0.5)... UPG (A5,0.8). |