reproducibilityindex.ai

Deep Fusion: Efficient Network Training via Pre-trained Initializations

Authors: Hanna Mazzawi, Javier Gonzalvo, Michael Wunder, Sammy Jerome, Benoit Dherin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show how Deep Fusion is a practical and effective approach that not only accelerates the training process but also reduces computational requirements, maintaining or surpassing traditional training methods performance in various NLP tasks and T5 model sizes.
Researcher Affiliation	Industry	Hanna Mazzawi 1 Xavi Gonzalvo 1 Michael Wunder 1 Sammy Jerome 1 Benoit Dherin 2 1Google Research, New York, NY, USA 2Google, Sunnyvale, CA, USA.
Pseudocode	No	The paper defines the FUSION operator using mathematical equations (Eq. 5) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	Yes	We begin by training T5 language models on the C4 dataset. ... We ﬁne-tuned high performing settings from the ﬁrst experiment together with a baseline on NLP tasks using the GLUE benchmark.
Dataset Splits	No	The paper mentions 'validation data' and 'evaluation accuracy' but does not provide specific details on the train/validation/test splits (percentages, counts, or explicit standard split names).
Hardware Specification	Yes	Table 1. Performance of different T5-Medium fusion methods at 1 million steps, replicated three times for standard deviation. Cost is in TPU V3 4x4 hours.
Software Dependencies	No	The paper mentions models like T5 and the use of TPUs, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We trained the following 4 experiments (see dimensionalities in Table 8 in Appendix B)... Every model (T5-S, T5-M, T5-L) is trained 1M steps. ... our experiments will show how the post-fusion learning rate affects the performance of the learning, as well as the parameters. To understand how the learning rate affects performance, we ran the normal T5 learning rate schedule with various offsets.