Characterizing Datapoints via Second-Split Forgetting

Authors: Pratyush Maini, Saurabh Garg, Zachary Lipton, J. Zico Kolter

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we propose second-split forgetting time (SSFT)... We perform specific ablation studies with datasets... Finally, we investigate second-split dynamics theoretically, analyzing overparametrized linear models.
Researcher Affiliation Collaboration Carnegie Mellon University1 Bosch Center for AI2 {pratyushmaini,zlipton}@cmu.edu; {sgarg2, zkolter}@cs.cmu.edu
Pseudocode No The paper defines FSLT and SSFT mathematically, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code for reproducing our experiments can be found at https://github.com/pratyushmaini/ssft.
Open Datasets Yes We show results on a variety of image classification datasets MNIST [13], CIFAR10 [29], and Imagenette [22]. For experiments in the language domain, we use the SST-2 dataset [45].
Dataset Splits No The paper states 'For each of the datasets, we split the training set into two equal partitions (SA, SB).' and discusses training and test sets, but it does not explicitly mention a separate validation split or how it was used.
Hardware Specification Yes All experiments can be performed on a single RTX2080 Ti.
Software Dependencies No The paper mentions models (ResNet-9, BERT-base) and optimizers (SGD) but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, Python versions).
Experiment Setup Yes Unless otherwise specified, we train a Res Net-9 model [4] using SGD optimizer with weight decay 5e-4 and momentum 0.9. We use the cyclic learning rate schedule [44] with a peak learning rate of 0.1 at the 10th epoch. We train for a maximum of 100 epochs or until we have 5 epochs of 100% training accuracy. We first train on SA, and then using the pre-initialized weights from stage 1, train on SB with the same learning parameters.