Deconstructing Distributions: A Pointwise Framework of Learning
Authors: Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we initiate a systematic study of pointwise performance in ML (see Figure 1). We show that such pointwise analysis can be useful both as a conceptual way to reason about learning, and as a practical tool for revealing structure in learning models and datasets. Framework: Definition of learning profiles (Section 2.1). We introduce a mathematical object capturing pointwise performance: the profile of a point z with respect to a parameterized family of classifiers T and a test distribution D (see Section 2.1). Taxonomy of points (Section 3). Profiles allow deconstructing popular datasets such as CIFAR-10, CINIC-10, Image Net, and Image Net-R into points that display qualitatively distinct behavior (see Figures 3 and 4). Pretrained vs. End-to-End Methods (Section 3.2). Our pointwise measures reveal stark differences between pre-trained and randomly initialized classifiers, even when they share not just identical architectures but also identical global accuracy. Accuracy on the line and CIFAR-10-NEG (Section 4). As an application of our framework, we construct a new out-of-distribution dataset CIFAR-10-NEG: a set of 1000 labeled images from CINIC-10 on which performance of standard models trained on CIFAR-10 is negatively correlated with CIFAR-10 accuracy. Theory: Monotonicity in models of Learning (Section C). For these models, we derive predictions for the monotonicity of certain pointwise performance measures. In particular, all of these models imply pointwise monotonicity behaviors that (as we show empirically) are not always seen in practice. |
| Researcher Affiliation | Academia | Gal Kaplun Harvard galkaplun@g.harvard.edu Nikhil Ghosh UC Berkeley nikhil ghosh@berkeley.edu Saurabh Garg Carnegie Mellon University sgarg2@andrew.cmu.edu Boaz Barak Harvard b@boazbarak.org Preetum Nakkiran UC San Diego preetum@nakkiran.org |
| Pseudocode | No | The paper defines concepts formally and provides mathematical formulas (e.g., for non-monotonicity score), but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Footnote 2 states 'Dataset: https://anonymous.4open.science/r/CIFAR-10-NEG-F697/'. This link is specifically for a dataset ('CIFAR-10-NEG') that was constructed as part of the work, not for the open-source code of the methodology itself. |
| Open Datasets | Yes | A EXPERIMENTAL DETAILS: CIFAR-10 Experiments In the experiments of Section 3.2, models trained from scratch on CIFAR10 were trained... The pre-trained models were first trained from scratch on the full Image Net 32 32 (Chrabaszcz et al., 2017) training set... Image Net Experiments For Image Net, we train 10 randomly initialized seeds for three standard architectures... evaluated both on and off distribution (Image Net-A, Image Net-R, Image Net-sketch, Image Net-v2 (Hendrycks et al., 2021b;d; Wang et al., 2019; Recht et al., 2019)). |
| Dataset Splits | No | The paper specifies training set sizes ('random subset of the training set (50,000 total samples) of size 10,000 for models trained from scratch and size 5,000 for pre-trained models') and a test set ('CIFAR-10 test set (10,000 total samples)'). However, it does not explicitly define a separate validation set split. |
| Hardware Specification | No | The paper mentions 'Res Net-18 and Dense Net-121' as architectures used, but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions training with 'SGD with a Cosine Annealing learning rate schedule' but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow) with their version numbers. |
| Experiment Setup | Yes | A EXPERIMENTAL DETAILS: CIFAR-10 Experiments In the experiments of Section 3.2, models trained from scratch on CIFAR10 were trained from a random initialization using SGD with a Cosine Annealing learning rate schedule with initial learning rate η = 0.01, batch size 128, and weight decay 5 10 4, for 30 epochs. The pre-trained models were first trained from scratch on the full Image Net 32 32 (Chrabaszcz et al., 2017) training set using those same hyperparameters but for 100 epochs. For fine-tuning pre-trained models on CIFAR-10, the linear classification layer was initialized randomly and then trained using SGD with a learning rate of η = 0.001 and batch size of 128 for 3 epochs with no weight-decay. For all training, we used standard data augmentation (i.e., random horizontal flip, random crop of size 32 32 with padding size 4, and mean/std normalization). Image Net Experiments For Image Net, we train 10 randomly initialized seeds for three standard architectures: Res Net-50, Dense Net121 and Dense Net-169 for 90 epochs with SGD with momentum 0.9, weight decay of 0.0001 and learning rate schedule of [0.1, 0.01, 0.001] for 30 epochs each and batch size of 256 (128 for Dense Net-169). |