reproducibilityindex.ai

Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Authors: Megha Srivastava, Simran Arora, Dan Boneh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of Res Net-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. We evaluate our verifiable training method on the two large-scale models listed below with all possible trainer and auditor pairs across NVIDIA GPUs A40, TITAN Xp, and RTX 2080 Ti (see Appendix B for more details).
Researcher Affiliation	Academia	Megha Srivastava Department of Computer Science Stanford University megha@cs.stanford.edu Simran Arora Department of Computer Science Stanford University simarora@stanford.edu Dan Boneh Department of Computer Science Stanford University dabo@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 train, Algorithm 2 audit, Algorithm 3 threshold, Algorithm 4 log
Open Source Code	Yes	Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. Our method is implemented entirely within the pytorch framework (compatible with version 2.3.1), and is available at https://github.com/meghabyte/verifiable-training.
Open Datasets	Yes	Res Net-50: We train (from random initialization) Res Net-50 (23M) on CIFAR-10 with dataset size 50K & batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: We finetune GPT-2 (117M) on a corpus of Shakespeare text with dataset size 1.1M tokens, batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti.
Dataset Splits	No	The paper specifies training and test data (e.g., 'Test accuracy', 'Test Loss Change') but does not provide details on a separate validation set split, such as percentages or specific sample counts for validation.
Hardware Specification	Yes	Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti)... Appendix B: NVIDIA Titan XP: 3840 Cores, 12 GB NVIDIA RTX 2080 Ti: 4352 Cores, 11 GB NVIDIA A40: 10752 Cores, 48 GB
Software Dependencies	Yes	We implement our verifiable training method entirely on top of the pytorch framework, with torch version 1.13.1 and CUDA version 11.7.
Experiment Setup	Yes	Res Net-50: ...batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: ...batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti. The trainer can then choose a training precision btr > bm, a rounding amount br bm, and a checkpointing interval k to periodically store small hashsha256(θ) of model weights θ in a Merkle tree, for efficient comparison with an eventual auditor.