Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape

Authors: Tao Li, Zhengbao He, Yujun Li, Yasheng Wang, Lifeng Shang, Xiaolin Huang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the performance of Flat-Lo RA on diverse tasks: natural language understanding, image classification, dialogue generation, mathematical reasoning, coding abilities, and text-to-image generation. We then demonstrate its enhanced out-of-domain generalization ability, followed by ablation studies and discussions.
Researcher Affiliation	Collaboration	1Department of Automation, Shanghai Jiao Tong University, Shanghai, China 2Huawei Noah s Ark Lab. Correspondence to: Xiaolin Huang <EMAIL>.
Pseudocode	No	The paper describes the methodology using text and mathematical equations in sections like "3. Flat-Lo RA: Low-Rank Adaptation over a Flat Loss Landscape" but does not contain a formal pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://github.com/nblt/Flat-Lo RA.
Open Datasets	Yes	Setting. We fine-tune the T5-Base model on several datasets from the GLUE benchmark, including MNLI, SST, Co LA, QNLI, and MRPC, following Wang et al. (2024)... We fine-tune the CLIP Vi T-B/32 model on five image classification tasks, including CIFAR10/100 (Krizhevsky & Hinton, 2009), Cars (Krause et al., 2013), SVHN (Netzer et al., 2011), and DTD (Cimpoi et al., 2014)... We fine-tune Llama 2-7B (Touvron et al., 2023) on three tasks: chat, math, and code... For chat task, we fine-tune the model on Wizard LM (Xu et al., 2023) and test on the MT-Bench dataset (Zheng et al., 2023). For math task, we fine-tune the model on Meta Math QA (Yu et al., 2024) and evaluate it on GSM8K evaluation set (Cobbe et al., 2021). For code task, we fine-tune the model on Code Feedback (Zheng et al., 2024) and evaluate it on Human Eval (Chen et al., 2021)... We fine-tune the SDXL model (Podell et al., 2023) with the pipeline of Dreambooth (Ruiz et al., 2023) and the scripts implemented by Hugging Face. The fine-tuning dataset, 3D Icons1, contains 23 training images, all of which have a square. 1https://huggingface.co/datasets/linoyts/3d_icon
Dataset Splits	Yes	We fine-tune the T5-Base model on several datasets from the GLUE benchmark... Performance is evaluated on the development set... For math task, we fine-tune the model on Meta Math QA (Yu et al., 2024) and evaluate it on GSM8K evaluation set (Cobbe et al., 2021). For code task, we fine-tune the model on Code Feedback (Zheng et al., 2024) and evaluate it on Human Eval (Chen et al., 2021). Training uses 52K for chat and 100K samples for math and code tasks... We fine-tune the Llama 2-13B model on the Alpaca dataset (Taori et al., 2023)... The model is evaluated on Instruct Eval (Chia et al., 2023)...
Hardware Specification	Yes	The training settings are the same with Section 4.3, and we use a micro-batch size of 2, running on an NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies	No	The paper mentions using T5-Base, CLIP ViT-B/32, Llama 2-7B, SDXL models, and Hugging Face scripts, and BF16 precision, but does not provide specific version numbers for any software dependencies like PyTorch, Python, or CUDA.
Experiment Setup	Yes	Setting. We fine-tune the T5-Base model on several datasets from the GLUE benchmark... We use Lo RA with rank 8 and Lo RA alpha 16. We fine-tune the models with 10 epochs using a cosine learning rate schedule; except for MNLI and QNLI, we use 1 epoch. We use a learning rate of 0.0005 for Lo RA fine-tuning and 0.0001 for full fine-tuning. The random perturbation strength σ is set to 0.05 with a cosine-increasing strategy. ... We fine-tune the CLIP Vi T-B/32 model on five image classification tasks... We experiment with Lo RA using ranks of 8 and 16 and fine-tune the models for 10 epochs under a cosine annealing schedule. The learning rate is set to 0.0005 for Lo RA and Flat-Lo RA and 0.0001 for full fine-tuning, with a weight decay of 0.1. The perturbation strength σ is set to 0.15 for Flat-Lo RA with a cosine-increasing strategy. ... To evaluate the scalability of Flat-Lo RA, we further conduct experiments on large language models. Specifically, we fine-tune Llama 2-7B (Touvron et al., 2023) on three tasks... We use a learning rate of 5e-4 and employ a cosine learning rate scheduler with a warmup ratio of 0.03. The Lo RA rank is set to 8 with Lo RA alpha 16, and the training epoch is set to 2. The backbone uses BF16 precision, with the parameters of Lo RA modules set to FP32 precision. ... The random perturbation strength σ is set to 0.05 with a cosine-increasing strategy. ... We fine-tune the SDXL model (Podell et al., 2023)... for 500 steps with a constant learning rate of 0.0001. The batch size is set to 1. The Lo RA rank and alpha are set to 4. The random perturbation strength σ is set to 0.1 for Flat-Lo RA.