reproducibilityindex.ai

On Training Implicit Models

Authors: Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, Zhouchen Lin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on large-scale tasks demonstrate that these lightweight phantom gradients signiﬁcantly accelerate the backward passes in training implicit models by roughly 1.7 , and even boost the performance over approaches based on the exact gradient on Image Net. We conduct an extensive set of synthetic, ablation, and large-scale experiments to both analyze the theoretical properties of the phantom gradient and validate its speedup and performances on various tasks, such as Image Net [10] classiﬁcation and Wikitext-103 [13] language modeling.
Researcher Affiliation	Academia	Zhengyang Geng1,2 Xin-Yu Zhang2 Shaojie Bai4 Yisen Wang2,3 Zhouchen Lin2,3,5 1Zhejiang Lab, China 2Key Lab. of Machine Perception, School of AI, Peking University 3Institute for Artiﬁcial Intelligence, Peking University 4Carnegie Mellon University 5Pazhou Lab, China
Pseudocode	No	The paper describes implementation steps in Appendix B but does not provide structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All training sources of this work are available at https://github.com/Gsunshine/phantom_grad.
Open Datasets	Yes	We conduct an extensive set of synthetic, ablation, and large-scale experiments to both analyze the theoretical properties of the phantom gradient and validate its speedup and performances on various tasks, such as Image Net [10] classiﬁcation and Wikitext-103 [13] language modeling. ... on the CIFAR-10 dataset, we use the MDEQ-Tiny model [3] (170K parameters) as the backbone model, denoted as the ablation setting.
Dataset Splits	No	The paper mentions using standard datasets like CIFAR-10, ImageNet, and Wikitext-103, which have well-known splits, but it does not explicitly state the train/validation/test split percentages, absolute sample counts, or specific predefined split references with citations within the text for reproduction.
Hardware Specification	No	The paper mentions using '8 GPUs' for training on Image Net, but it does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or any other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper does not explicitly provide specific software dependencies, such as programming language versions, library versions (e.g., PyTorch, TensorFlow), or solver versions, that are needed to replicate the experiments.
Experiment Setup	Yes	In the synthetic setting, we directly set the Lipschitz constant of F as Lh = 0.9, and use 100 ﬁxed-point iterations to solve the root h of Eq. (1) until the relative error satisﬁes h F(h, z) / h < 10 5. ... The MDEQ model employs a 10-layer unrolling for pretraining... We use the MDEQ-Tiny model [3]... Adam SGD... UPG (A5,0.5), NPG (A5,0.5)... UPG (A9,0.5)... UPG (A5,0.8).