Model-agnostic Measure of Generalization Difficulty
Authors: Akhilan Boopathy, Kevin Liu, Jaedong Hwang, Shu Ge, Asaad Mohammedsaleh, Ila R Fiete
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that applied empirically, it formally quantifies intuitively expected trends, e.g. that in terms of required inductive bias, MNIST < CIFAR10 < Imagenet and fully observable Markov decision processes (MDPs) < partially observable MDPs. Further, we show that classification of complex images < few-shot meta-learning with simple images. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released at: https://github.com/ Fiete Lab/inductive-bias-complexity |
| Open Datasets | Yes | Image classification We estimate the task difficulty of the commonly used image classification benchmarks MNIST, SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009). For all tasks, we approximate r to be the maximum norm of the training data points. |
| Dataset Splits | Yes | We estimate the data dimensionality m by relating the decay rate of nearest neighbor distances in the training set to intrinsic dimensionality (Pope et al., 2021). The theoretical results (Dekkers et al., 1989) require that k and n/k go to infinity as n . For Image Net, we choose n = 40000 and k = 200. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | The target generalization error ε is fixed at a level based on the type of the task (see Appendix D Table 3). The required detail (resolution) per dimension δ is set to the inter-class margin for classification tasks and a scale at which state perturbations do not significantly affect trajectories for RL tasks. See Appendix D for full details. For all tasks, we set ε/LL as the desired performance level, which is the distance in the output space corresponding to an error of ε. For each task, we set a fixed desired performance level corresponding to the category of the task (i.e. a fixed error rate of 1.0% for image classification tasks and a fixed error rate of 0.001 for RL tasks); see Table 3. |