Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Subspace Networks: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Authors: Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, Alexander Long
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments We evaluate decoder-only models (based on Llama 3 [14]) across four large-scale datasets: Wiki Text (WT) [33], Book Corpus (BC) [63], Open Web Text (OWT) [15], and C4 [37]. For WT, we use the standard splits; for BC and OWT, we randomly select 10% of training data as validation; for C4, due to computational constraints, we report training loss only. The base model has a context length of 1024, embedding dimension 4096, 24 heads, and 8 layers ( 2B parameters); larger models (up to 8B parameters) are noted explicitly in ablation sections. We use a base learning rate η = 3e-4 (with warmup and linear decay), weight decay 0.01, and batch size 32, unless otherwise specified. |
| Researcher Affiliation | Industry | Sameera Ramasinghe Ajanthan Thalaiyasingam Gil Avraham Yan Zuo Alexander Long Pluralis Research |
| Pseudocode | No | The paper contains mathematical formulations and derivations but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Provided with supplementary materials. |
| Open Datasets | Yes | We evaluate decoder-only models (based on Llama 3 [14]) across four large-scale datasets: Wiki Text (WT) [33], Book Corpus (BC) [63], Open Web Text (OWT) [15], and C4 [37]. |
| Dataset Splits | Yes | For WT, we use the standard splits; for BC and OWT, we randomly select 10% of training data as validation; for C4, due to computational constraints, we report training loss only. |
| Hardware Specification | Yes | Experiments (except the 8B Llama run on L4 GPUs with internet-based decentralized connections) use A10g GPUs (24GB VRAM) with one layer per GPU. |
| Software Dependencies | No | The paper mentions "torch.distributed.pipelining" and "Torch Titan [29]" but does not specify their version numbers or the versions of other key software components. |
| Experiment Setup | Yes | The base model has a context length of 1024, embedding dimension 4096, 24 heads, and 8 layers ( 2B parameters); larger models (up to 8B parameters) are noted explicitly in ablation sections. We use a base learning rate η = 3e-4 (with warmup and linear decay), weight decay 0.01, and batch size 32, unless otherwise specified. We use GPipe [18] via torch.distributed.pipelining, integrating our compression into all but the final transformer layer. We initialize Uk with isotropic Gaussian noise and set k = 40, achieving 100 compression. |