Tools for Verifying Neural Models' Training Data
Authors: Dami Choi, Yonadav Shavit, David K. Duvenaud
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that our verification procedures can catch a wide variety of attacks, including all known attacks from the Proof-of-Learning literature. ... We demonstrate the practical effectiveness of our defenses via experiments on two language models (Section 6). |
| Researcher Affiliation | Academia | U. Toronto & Vector Institute choidami@cs.toronto.edu Yonadav Shavit Harvard University yonadav@g.harvard.edu David Duvenaud U. Toronto & Vector Institute duvenaud@cs.toronto.edu |
| Pseudocode | No | The paper describes a combined verification protocol with numbered steps in Appendix A, but it does not present this or any other procedure in a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statements about releasing its source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | Our main experiments are run on GPT-2 [RWC+19] with 124M parameters and trained on the Open Web Text dataset [GCPT19]. ... The data addition attack experiments in Section 6 further use the Github component of the Pile dataset [GBB+20]... In addition to training our own models, we also evaluate Pythia checkpoints [BSA+23] published by Eleuther AI... trained on the Pile dataset. |
| Dataset Splits | No | The paper mentions using a "validation set Dv" and discusses how a Prover can construct a "validation subset Dv by holding out the last nv data-points". However, it does not specify the concrete size or proportion of the validation split used in their own experiments. |
| Hardware Specification | Yes | All experiments were done using 4 NVIDIA A40 GPUs. |
| Software Dependencies | No | The paper mentions specific models (GPT-2, Pythia) which imply certain underlying frameworks, but it does not provide specific version numbers for any software, libraries, or dependencies used in the experiments. |
| Experiment Setup | Yes | We use a batch size of 491,520 tokens and train for 18,000 steps... saving a checkpoint every 1000 steps. ... We use a cosine learning rate schedule that decays by a factor of 10x by the end of training, with a linear warmup of 2000 steps to a peak learning rate of 0.0006. |