Characterizing Datapoints via Second-Split Forgetting
Authors: Pratyush Maini, Saurabh Garg, Zachary Lipton, J. Zico Kolter
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose second-split forgetting time (SSFT)... We perform specific ablation studies with datasets... Finally, we investigate second-split dynamics theoretically, analyzing overparametrized linear models. |
| Researcher Affiliation | Collaboration | Carnegie Mellon University1 Bosch Center for AI2 {pratyushmaini,zlipton}@cmu.edu; {sgarg2, zkolter}@cs.cmu.edu |
| Pseudocode | No | The paper defines FSLT and SSFT mathematically, but it does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code for reproducing our experiments can be found at https://github.com/pratyushmaini/ssft. |
| Open Datasets | Yes | We show results on a variety of image classification datasets MNIST [13], CIFAR10 [29], and Imagenette [22]. For experiments in the language domain, we use the SST-2 dataset [45]. |
| Dataset Splits | No | The paper states 'For each of the datasets, we split the training set into two equal partitions (SA, SB).' and discusses training and test sets, but it does not explicitly mention a separate validation split or how it was used. |
| Hardware Specification | Yes | All experiments can be performed on a single RTX2080 Ti. |
| Software Dependencies | No | The paper mentions models (ResNet-9, BERT-base) and optimizers (SGD) but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, Python versions). |
| Experiment Setup | Yes | Unless otherwise specified, we train a Res Net-9 model [4] using SGD optimizer with weight decay 5e-4 and momentum 0.9. We use the cyclic learning rate schedule [44] with a peak learning rate of 0.1 at the 10th epoch. We train for a maximum of 100 epochs or until we have 5 epochs of 100% training accuracy. We first train on SA, and then using the pre-initialized weights from stage 1, train on SB with the same learning parameters. |