Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimistic Verifiable Training by Controlling Hardware Nondeterminism
Authors: Megha Srivastava, Simran Arora, Dan Boneh
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of Res Net-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. We evaluate our verifiable training method on the two large-scale models listed below with all possible trainer and auditor pairs across NVIDIA GPUs A40, TITAN Xp, and RTX 2080 Ti (see Appendix B for more details). |
| Researcher Affiliation | Academia | Megha Srivastava Department of Computer Science Stanford University EMAIL Simran Arora Department of Computer Science Stanford University EMAIL Dan Boneh Department of Computer Science Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 train, Algorithm 2 audit, Algorithm 3 threshold, Algorithm 4 log |
| Open Source Code | Yes | Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. Our method is implemented entirely within the pytorch framework (compatible with version 2.3.1), and is available at https://github.com/meghabyte/verifiable-training. |
| Open Datasets | Yes | Res Net-50: We train (from random initialization) Res Net-50 (23M) on CIFAR-10 with dataset size 50K & batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: We finetune GPT-2 (117M) on a corpus of Shakespeare text with dataset size 1.1M tokens, batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti. |
| Dataset Splits | No | The paper specifies training and test data (e.g., 'Test accuracy', 'Test Loss Change') but does not provide details on a separate validation set split, such as percentages or specific sample counts for validation. |
| Hardware Specification | Yes | Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti)... Appendix B: NVIDIA Titan XP: 3840 Cores, 12 GB NVIDIA RTX 2080 Ti: 4352 Cores, 11 GB NVIDIA A40: 10752 Cores, 48 GB |
| Software Dependencies | Yes | We implement our verifiable training method entirely on top of the pytorch framework, with torch version 1.13.1 and CUDA version 11.7. |
| Experiment Setup | Yes | Res Net-50: ...batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: ...batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti. The trainer can then choose a training precision btr > bm, a rounding amount br bm, and a checkpointing interval k to periodically store small hashsha256(θ) of model weights θ in a Merkle tree, for efficient comparison with an eventual auditor. |