Optimistic Verifiable Training by Controlling Hardware Nondeterminism
Authors: Megha Srivastava, Simran Arora, Dan Boneh
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32 precision for both full-training and fine-tuning of Res Net-50 (23M) and GPT-2 (117M) models. Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. We evaluate our verifiable training method on the two large-scale models listed below with all possible trainer and auditor pairs across NVIDIA GPUs A40, TITAN Xp, and RTX 2080 Ti (see Appendix B for more details). |
| Researcher Affiliation | Academia | Megha Srivastava Department of Computer Science Stanford University megha@cs.stanford.edu Simran Arora Department of Computer Science Stanford University simarora@stanford.edu Dan Boneh Department of Computer Science Stanford University dabo@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 train, Algorithm 2 audit, Algorithm 3 threshold, Algorithm 4 log |
| Open Source Code | Yes | Our verifiable training scheme significantly decreases the storage and time costs compared to proof-based systems, and is publicly released at https://github.com/meghabyte/verifiable-training. Our method is implemented entirely within the pytorch framework (compatible with version 2.3.1), and is available at https://github.com/meghabyte/verifiable-training. |
| Open Datasets | Yes | Res Net-50: We train (from random initialization) Res Net-50 (23M) on CIFAR-10 with dataset size 50K & batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: We finetune GPT-2 (117M) on a corpus of Shakespeare text with dataset size 1.1M tokens, batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti. |
| Dataset Splits | No | The paper specifies training and test data (e.g., 'Test accuracy', 'Test Loss Change') but does not provide details on a separate validation set split, such as percentages or specific sample counts for validation. |
| Hardware Specification | Yes | Across three different NVIDIA GPUs (A40, Titan XP, RTX 2080 Ti)... Appendix B: NVIDIA Titan XP: 3840 Cores, 12 GB NVIDIA RTX 2080 Ti: 4352 Cores, 11 GB NVIDIA A40: 10752 Cores, 48 GB |
| Software Dependencies | Yes | We implement our verifiable training method entirely on top of the pytorch framework, with torch version 1.13.1 and CUDA version 11.7. |
| Experiment Setup | Yes | Res Net-50: ...batch size B=64. Test accuracy = 90.7% after 100 epochs training on Titan RTX Ti. GPT-2: ...batch size B=8, and sequence length 64. Perplexity = 4.22 after 1 epoch training on Titan RTX Ti. The trainer can then choose a training precision btr > bm, a rounding amount br bm, and a checkpointing interval k to periodically store small hashsha256(θ) of model weights θ in a Merkle tree, for efficient comparison with an eventual auditor. |