Optimistic Verifiable Training by Controlling Hardware Nondeterminism

Authors: Megha Srivastava, Simran Arora, Dan Boneh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of Res Net-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. We evaluate our verifiable training method on the two large-scale models listed below with all possible trainer and auditor pairs across NVIDIA GPUs A40, TITAN Xp, and RTX 2080 Ti (see Appendix B for more details).
Researcher Affiliation Academia Megha Srivastava Department of Computer Science Stanford University megha@cs.stanford.edu Simran Arora Department of Computer Science Stanford University simarora@stanford.edu Dan Boneh Department of Computer Science Stanford University dabo@cs.stanford.edu
Pseudocode Yes Algorithm 1 train, Algorithm 2 audit, Algorithm 3 threshold, Algorithm 4 log
Open Source Code Yes Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. Our method is implemented entirely within the pytorch framework (compatible with version 2.3.1), and is available at https://github.com/meghabyte/verifiable-training.
Open Datasets Yes Res Net-50: We train (from random initialization) Res Net-50 (23M) on CIFAR-10 with dataset size 50K & batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: We finetune GPT-2 (117M) on a corpus of Shakespeare text with dataset size 1.1M tokens, batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti.
Dataset Splits No The paper specifies training and test data (e.g., 'Test accuracy', 'Test Loss Change') but does not provide details on a separate validation set split, such as percentages or specific sample counts for validation.
Hardware Specification Yes Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti)... Appendix B: NVIDIA Titan XP: 3840 Cores, 12 GB NVIDIA RTX 2080 Ti: 4352 Cores, 11 GB NVIDIA A40: 10752 Cores, 48 GB
Software Dependencies Yes We implement our verifiable training method entirely on top of the pytorch framework, with torch version 1.13.1 and CUDA version 11.7.
Experiment Setup Yes Res Net-50: ...batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: ...batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti. The trainer can then choose a training precision btr > bm, a rounding amount br bm, and a checkpointing interval k to periodically store small hashsha256(θ) of model weights θ in a Merkle tree, for efficient comparison with an eventual auditor.