Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Subspace Optimization for Large Language Models with Convergence Guarantees

Authors: Yutong He, Pengrui Li, Yipeng Hu, Chuyan Chen, Kun Yuan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically validate our theoretical results and thoroughly test the proposed mechanisms. Codes are available at https: //github.com/pkumelon/Golore. 6. Experiments We evaluate Ga Lore and Go Lore on several different tasks, including a counter-example problem (1), pre-training and fine-tuning LLMs with real benchmarks. Throughout our experiments, Go Lore@x% uses Ga Lore in the first (100 x)% iterations and Go Lore in the last x% iterations, L.B. Ga Lore denotes large-batch Ga Lore, and Full Params. denotes full-parameter training.
Researcher Affiliation	Academia	1Peking University 2Zhongguancun Academy 3Beihang University 4AI for Science Institute, Beijing, China 5National Engineering Laboratory for Big Data Analytics and Applications. Correspondence to: Kun Yuan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Ga Lore / Go Lore algorithm framework using stochastic / deterministic / large-batch gradients
Open Source Code	Yes	Codes are available at https: //github.com/pkumelon/Golore.
Open Datasets	Yes	We pre-trained LLa MA-60M on the C4 (Raffel et al., 2020) dataset for 30,000 iterations using various algorithms... fine-tuned pre-trained Ro BERTa models (Liu, 2019) on the GLUE benchmark (Wang, 2018)... LLa MA2-7B models (Touvron et al., 2023) on the Wino Grande dataset (Sakaguchi et al., 2021), and OPT-13B models (Zhang et al., 2022) on the Bool Q dataset (Clark et al., 2019).
Dataset Splits	No	Pre-training tasks on C4 dataset. We pre-trained LLa MA-60M on C4 dataset for 30,000 iterations... Fine-tuning tasks on Wino Grande dataset. We fine-tune pre-trained LLa MA2-7B model on the Wino Grande dataset for 30 epochs... Fine-tuning tasks on Bool Q dataset. We fine-tune pre-trained LLa MA2-7B model on the Bool Q dataset on 4 NVIDIA A100 80G GPUs. ... We further fine-tune pre-trained OPT-13B for 1 epoch... The text describes the duration of fine-tuning and number of iterations but does not explicitly state dataset splits (e.g., percentages for train/val/test).
Hardware Specification	Yes	enabling the pre-training of a 7B model on an NVIDIA RTX 4090 with 24GB of memory. Pre-training tasks on C4 dataset. We pre-trained LLa MA-60M on the C4 (Raffel et al., 2020) dataset for 30,000 iterations on 4 NVIDIA A100 40G GPUs. Fine-tuning tasks on Wino Grande dataset. ... on 4 NVIDIA A100 80G GPUs. Fine-tuning tasks on GLUE benchmark. We fine-tune pre-trained Ro BERTa-Base model on the GLUE benchmark for 30 epochs on a single Ge Force RTX 4090.
Software Dependencies	No	All implementations utilized the Adam W optimizer in BF16 format. We use MSGD as the subspace optimizer... The paper mentions specific optimizers and a format but does not provide specific version numbers for any software components.
Experiment Setup	Yes	Pre-training tasks on C4 dataset. We use batch size 128, learning rate 1.0e-3, rank 128, scaling factor α = 1, subspace changing frequency τ = 200, and a max sequence length of 256. Table 5. Hyperparameters used in fine-tuning pre-trained Ro BERTa-Base model on the GLUE benchmark.