reproducibilityindex.ai

Parameter-free Clipped Gradient Descent Meets Polyak

Authors: Yuki Takezawa, Han Bao, Ryoma Sato, Kenta Niwa, Makoto Yamada

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5.
Researcher Affiliation	Academia	Yuki Takezawa1,2, Han Bao1,2, Ryoma Sato3, Kenta Niwa4, Makoto Yamada2 1Kyoto University, 2OIST, 3NII, 4NTT Communication Science Laboratories
Pseudocode	Yes	Algorithm 1 Inexact Polyak Stepsize
Open Source Code	Yes	Our code is contained in the supplementary material.
Open Datasets	Yes	For LSTM, Nano-GPT, and T5, we used the Penn Treebank, Shakespeare, and C4 as training datasets, respectively.
Dataset Splits	No	For SGD and Clipped SGD, we tuned the stepsize and gradient clipping threshold on validation datasets.
Hardware Specification	Yes	We ran all experiments on an A100 GPU.
Software Dependencies	No	The paper references specific model implementations (e.g., 'LSTM: https://github.com/salesforce/awd-lstm-lm', 'Nano-GPT: https://github.com/karpathy/nano GPT', 'T5: https://github.com/Piotr Nawrot/nano T5'), which imply software dependencies like PyTorch, but does not list specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	In our experiments, we ran the clipped gradient descent with the following hyperparameters and tuned the hyperparameters by grid search. Table 2: Hyperparameter settings for clipped gradient descent. Learning Rate {1, 1.0 10 1, , 1.0 10 8} Gradient Clipping Threshold {0.01, 0.1, 1, 5, 10, 15, 20, }. Table 4: Hyperparameter settings for LSTM. Learning Rate {100, 50, 10, 1, 0.1, 0.01} Gradient Clipping Threshold {0.5, 1, , 4.5, 5, } Batch Size 80.