Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing
Authors: Ruijie (Ray) Zhang, Ziyue (Alvin) Liu, Zhengyang Wang, Zheng Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively validate the benefits of La X on pre-training tasks with Vi T-Base/Large and LLa MA-like models ranging from 60M to 1B parameters. La X boosts low-rank model performance to match or exceed the full-rank baselines while using 2-3 fewer parameters. When equipped with low-rank adapters (i.e., Lo RA [23]) for fine-tuning LLa MA-7/13B, La X consistently improves performance on arithmetic and common sense reasoning tasks with negligible cost. |
| Researcher Affiliation | Academia | Ruijie Zhang*, Ziyue Liu*, Zhengyang Wang, Zheng Zhang University of California at Santa Barbara EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methods in Section 3, 'The La X Method', 'Variants of La X', 'La X Gates', but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | We provide our code here |
| Open Datasets | Yes | We pretrain Vi T-Base/Large (224 resolution with 16 16 patch size) and its corresponding low-rank variants on Image Net-1K [49]. ...pre-training LLa MA-like models from 60M to 1B parameters on C4 [48] without data repetition... All the datasets used are public, and we provide full hyperparameters in the Appendix. |
| Dataset Splits | Yes | We pretrain Vi T-Base/Large (224 resolution with 16 16 patch size) and its corresponding low-rank variants on Image Net-1K [49]. ... For this evaluation, we configure La X with Linear Gate. Tab. 7 shows that augmenting Lo RA with La X yields consistent accuracy improvements across all six arithmetic subtasks for both LLa MA-7B and LLa MA-13B. ... Following [25, 41], we merge the training datasets from all eight commonsense reasoning tasks into a unified training set and evaluate the performance separately on each task. |
| Hardware Specification | Yes | Pre-training large foundation models is highly computationally expensive. For instance, a single pre-training run of an SVD-based Vi T-B model under our setup requires approximately 450 A100 GPU hours. |
| Software Dependencies | No | The paper does not explicitly state specific version numbers for key software components such as programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | All models are trained from scratch for 300 epochs according to the setting of [9]. For SVD and Co LA, we only place Inter-layer La X with the Tensor Gate. For tensor train, we use both Inter-Layer La X and Intra Layer La X with the Tensor Gate. More details of the model and training configurations are provided in Appendix A.1. ... Hyperparameters (Lo RA) LLa MA-7B LLa MA-13B Rank r 32 32 α 64 64 Dropout 0.0 0.0 Optimizer Adam W Adam W LR 3e-4 3e-4 Scheduler Linear Linear Batch size 16 16 Accumulation steps 4 4 Cut off length 256 256 Warmup steps 100 100 Epochs 3 3 |