Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Subspace Networks: Scaling Decentralized Training with Communication-Efficient Model Parallelism

Authors: Sameera Ramasinghe, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zuo, Alexander Long

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments We evaluate decoder-only models (based on Llama 3 [14]) across four large-scale datasets: Wiki Text (WT) [33], Book Corpus (BC) [63], Open Web Text (OWT) [15], and C4 [37]. For WT, we use the standard splits; for BC and OWT, we randomly select 10% of training data as validation; for C4, due to computational constraints, we report training loss only. The base model has a context length of 1024, embedding dimension 4096, 24 heads, and 8 layers ( 2B parameters); larger models (up to 8B parameters) are noted explicitly in ablation sections. We use a base learning rate η = 3e-4 (with warmup and linear decay), weight decay 0.01, and batch size 32, unless otherwise specified.
Researcher Affiliation	Industry	Sameera Ramasinghe Ajanthan Thalaiyasingam Gil Avraham Yan Zuo Alexander Long Pluralis Research
Pseudocode	No	The paper contains mathematical formulations and derivations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Provided with supplementary materials.
Open Datasets	Yes	We evaluate decoder-only models (based on Llama 3 [14]) across four large-scale datasets: Wiki Text (WT) [33], Book Corpus (BC) [63], Open Web Text (OWT) [15], and C4 [37].
Dataset Splits	Yes	For WT, we use the standard splits; for BC and OWT, we randomly select 10% of training data as validation; for C4, due to computational constraints, we report training loss only.
Hardware Specification	Yes	Experiments (except the 8B Llama run on L4 GPUs with internet-based decentralized connections) use A10g GPUs (24GB VRAM) with one layer per GPU.
Software Dependencies	No	The paper mentions "torch.distributed.pipelining" and "Torch Titan [29]" but does not specify their version numbers or the versions of other key software components.
Experiment Setup	Yes	The base model has a context length of 1024, embedding dimension 4096, 24 heads, and 8 layers ( 2B parameters); larger models (up to 8B parameters) are noted explicitly in ablation sections. We use a base learning rate η = 3e-4 (with warmup and linear decay), weight decay 0.01, and batch size 32, unless otherwise specified. We use GPipe [18] via torch.distributed.pipelining, integrating our compression into all but the final transformer layer. We initialize Uk with isotropic Gaussian noise and set k = 40, achieving 100 compression.