Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing

Authors: Ruijie (Ray) Zhang, Ziyue (Alvin) Liu, Zhengyang Wang, Zheng Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively validate the benefits of La X on pre-training tasks with Vi T-Base/Large and LLa MA-like models ranging from 60M to 1B parameters. La X boosts low-rank model performance to match or exceed the full-rank baselines while using 2-3 fewer parameters. When equipped with low-rank adapters (i.e., Lo RA [23]) for fine-tuning LLa MA-7/13B, La X consistently improves performance on arithmetic and common sense reasoning tasks with negligible cost.
Researcher Affiliation Academia Ruijie Zhang*, Ziyue Liu*, Zhengyang Wang, Zheng Zhang University of California at Santa Barbara EMAIL, EMAIL
Pseudocode No The paper describes the methods in Section 3, 'The La X Method', 'Variants of La X', 'La X Gates', but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes We provide our code here
Open Datasets Yes We pretrain Vi T-Base/Large (224 resolution with 16 16 patch size) and its corresponding low-rank variants on Image Net-1K [49]. ...pre-training LLa MA-like models from 60M to 1B parameters on C4 [48] without data repetition... All the datasets used are public, and we provide full hyperparameters in the Appendix.
Dataset Splits Yes We pretrain Vi T-Base/Large (224 resolution with 16 16 patch size) and its corresponding low-rank variants on Image Net-1K [49]. ... For this evaluation, we configure La X with Linear Gate. Tab. 7 shows that augmenting Lo RA with La X yields consistent accuracy improvements across all six arithmetic subtasks for both LLa MA-7B and LLa MA-13B. ... Following [25, 41], we merge the training datasets from all eight commonsense reasoning tasks into a unified training set and evaluate the performance separately on each task.
Hardware Specification Yes Pre-training large foundation models is highly computationally expensive. For instance, a single pre-training run of an SVD-based Vi T-B model under our setup requires approximately 450 A100 GPU hours.
Software Dependencies No The paper does not explicitly state specific version numbers for key software components such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup Yes All models are trained from scratch for 300 epochs according to the setting of [9]. For SVD and Co LA, we only place Inter-layer La X with the Tensor Gate. For tensor train, we use both Inter-Layer La X and Intra Layer La X with the Tensor Gate. More details of the model and training configurations are provided in Appendix A.1. ... Hyperparameters (Lo RA) LLa MA-7B LLa MA-13B Rank r 32 32 α 64 64 Dropout 0.0 0.0 Optimizer Adam W Adam W LR 3e-4 3e-4 Scheduler Linear Linear Batch size 16 16 Accumulation steps 4 4 Cut off length 256 256 Warmup steps 100 100 Epochs 3 3