Lion Secretly Solves a Constrained Optimization: As Lyapunov Predicts

Authors: Lizhang Chen, Bo Liu, Kaizhao Liang, qiang liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section provides a preliminary investigation on the behaviors of Lion-K with different K. We experiment with the Ks listed in Table 2 on the toy example shown in Figure 1 to confirm the behavior follows exactly as what the theory predicts. Then we focus on the Lion-ℓp optimizer with general p [1, 2] since it is the most straightforward extension of the original Lion (with p = 1). Experiment Setting For the Image Net training, we follow the standard Py Torch Image Net training code.1 We train the Res Net-50 and the Vi T-B/16 model using batch size 1024 and cosine learning rate scheduler. For GPT-2 training, we follow the Hugging Face code2, train it on Open Web Text3 using cosine learning rate scheduler.
Researcher Affiliation Academia Lizhang Chen Bo Liu Kaizhao Liang Qiang Liu The University of Texas at Austin {lzchen,bliu,kaizhaol,lqiang}@utexas.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Algorithm updates are presented as mathematical equations (1) and (2).
Open Source Code No The paper mentions using 'standard Py Torch Image Net training code' and 'Hugging Face code' but does not provide specific links to source code for the methodology described by the authors, nor does it state that the authors are releasing their own code.
Open Datasets Yes For the Image Net training, we follow the standard Py Torch Image Net training code. We train the Res Net-50 and the Vi T-B/16 model using batch size 1024 and cosine learning rate scheduler. For GPT-2 training, we follow the Hugging Face code, train it on Open Web Text using cosine learning rate scheduler.
Dataset Splits No The paper mentions using ImageNet, CIFAR-10, and Open Web Text, which are standard datasets. While it refers to 'standard Py Torch Image Net training code,' it does not explicitly provide specific dataset split information (percentages, sample counts) for training, validation, or testing, nor does it cite a source that defines the exact splits used.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions 'Py Torch Image Net training code' and 'Hugging Face code' but does not provide specific version numbers for these or other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes Experiment Setting For the Image Net training, we follow the standard Py Torch Image Net training code.1 We train the Res Net-50 and the Vi T-B/16 model using batch size 1024 and cosine learning rate scheduler. For GPT-2 training, we follow the Hugging Face code2, train it on Open Web Text3 using cosine learning rate scheduler.