Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

Authors: Alexander Long, Chamin P Hewa Koneputugodage, Thalaiyasingam Ajanthan, Yan Zuo, Gil Avraham, Violetta Shevchenko, Hadi Mohaghegh Dolatabadi, Sameera Ramasinghe

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On Qwen2.5-0.5B and Llama-3.2-1B, 10 000 transforms leave FP32 perplexity unchanged ( PPL< 0.01; Jensen Shannon drift < 4 10 5), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0.1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1.6% time and < 1% memory. We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes 60% of the tokens required to train from scratch.
Researcher Affiliation	Industry	Alexander Long Chamin Hewa Koneputugodage Thalaiyasingam Ajanthan Yan Zuo Gil Avraham Violetta Shevchenko Hadi Mohaghegh Dolatabadi Sameera Ramasinghe Pluralis Research
Pseudocode	No	The paper describes methods in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Our framework is simple to implement, with all details needed to implement it given. We use openly available neural network frameworks (discussed in the appendix) and use openly available data (explained in the main paper). The paper does not provide a direct link to the source code for the methodology described, nor does it explicitly state that code for their implementation is being released. Instead, it states the framework is simple to implement and uses openly available neural network frameworks.
Open Datasets	Yes	Evaluation is performed on the Wiki Text (v2-raw) validation set [37]. We evaluate such attacks on a pretrained Llama 3.2-1B model using the Fine Web dataset [42].
Dataset Splits	No	Evaluation is performed on the Wiki Text (v2-raw) validation set [37]. The paper mentions a validation set but does not provide specific training/test/validation split percentages or sample counts for the datasets used.
Hardware Specification	Yes	On an NVIDIA A100 GPU (FP32), a morphing step requires approximately 0.05 s, 95% of which is orthogonal matrix generation in FP64, yielding an amortized latency overhead of about 3%.
Software Dependencies	No	We use Py Torch [40] for our implementations. In our implementation we use Muon [19], which also only uses m. Furthermore we use Equation (73), which is easier to implement, rather than Equation (71). Since our framework is simple, we implement it within three frameworks, torchtune [58], torchtitan [34] and Nano GPT [24]. The paper lists several software components and frameworks but does not provide specific version numbers for them.
Experiment Setup	Yes	We consider the case of Llama 3.2 1B (which has dimension D = 2048), with a batch size B = 4, sequence length S = 1024, and morphing every 30s. Applying a transform every 100 steps adds only 1.6% time overhead and <1% memory overhead. We grid search on the learning rate between torchtune s learning rate for finetuning Llama, 2e-5, to torchtitan s learning rate for from scratch training Llama, 3e-4.