reproducibilityindex.ai

Efficient Sketches for Training Data Attribution and Studying the Loss Landscape

Authors: Andrea Schioppa

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To thoroughly evaluate the proposed sketching methods, we present a comprehensive set of experiments. First, we highlight the limitations of existing TDA scaling strategies (Sec. 5.2). Next, we dissect the impact of speciﬁc design choices on our sketches (Sec. 5.3). We then introduce and validate an algorithm for intrinsic dimension estimation, enabling computational savings (Sec. 5.4) and show-casing that the intrinsic dimensionality of generative tasks can be large. Finally, we apply our techniques to explore the evolution of the Hessian spectrum during pre-trained language model ﬁne-tuning (Sec. 5.5).
Researcher Affiliation	Industry	Andrea Schioppa Google Deep Mind Amsterdam, the Netherlands arischioppa@google.com
Pseudocode	Yes	Listing 9: An algorithm that searches the intrinsic dimension
Open Source Code	Yes	Python code to implement the proposed algorithms (in Jax) is provided in Appendix B.
Open Datasets	Yes	We adopt the setup of [6]: a generative task ﬁne-tuning GPT-2 on the Wiki Text-103 dataset (BART and zs RE results in Appendix A). Our experiments evaluate the efﬁciency and accuracy of our intrinsic dimension estimation algorithm (presented in Sec. 4). We consider two experimental setups: classiﬁcation, where we ﬁne-tune Roberta on SNLI with accuracy as the target metric; generation, where we ﬁne-tune BART on XSUM for text summarization, using Rouge1 and Rouge2 for evaluation.
Dataset Splits	No	The paper mentions training and evaluating on datasets like Wiki Text-103, SNLI, and XSUM, and refers to 'fine-tuning' and 'evaluation', but does not explicitly state the training/validation/test dataset splits (e.g., percentages or sample counts) used for these experiments.
Hardware Specification	Yes	ALGO GPU (V100) TPU (V2) T (MS) M (GB) ... GPU is V100, TPU is TPUv2. ... Experiments in Sec. 5.4 used 2 V100s in the classiﬁcation setting and 2 A100s in the generation setting. Experiments in Sec. 5.5 used 2 A100s.
Software Dependencies	No	We use Jax and Hugging Face libraries; experiments in Sec. 5.2 were carried out using one GPU V100 or a TPUv2 (8 cores).
Experiment Setup	Yes	Appendix B.6, B.7, and B.8 provide detailed hyper-parameters for the experiments. For example: 'Roberta was ﬁne-tuned with a batch size of 32 for 10k steps with Adam and a constant learning rate of 2 10 5. For the search algorithm 9 the learning rate was increased to 10 4, δ = 0.1 and c = 2k steps.'