reproducibilityindex.ai

Model-agnostic Measure of Generalization Difficulty

Authors: Akhilan Boopathy, Kevin Liu, Jaedong Hwang, Shu Ge, Asaad Mohammedsaleh, Ila R Fiete

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that applied empirically, it formally quantifies intuitively expected trends, e.g. that in terms of required inductive bias, MNIST < CIFAR10 < Imagenet and fully observable Markov decision processes (MDPs) < partially observable MDPs. Further, we show that classification of complex images < few-shot meta-learning with simple images.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is released at: https://github.com/ Fiete Lab/inductive-bias-complexity
Open Datasets	Yes	Image classification We estimate the task difficulty of the commonly used image classification benchmarks MNIST, SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009). For all tasks, we approximate r to be the maximum norm of the training data points.
Dataset Splits	Yes	We estimate the data dimensionality m by relating the decay rate of nearest neighbor distances in the training set to intrinsic dimensionality (Pope et al., 2021). The theoretical results (Dekkers et al., 1989) require that k and n/k go to infinity as n . For Image Net, we choose n = 40000 and k = 200.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	The target generalization error ε is fixed at a level based on the type of the task (see Appendix D Table 3). The required detail (resolution) per dimension δ is set to the inter-class margin for classification tasks and a scale at which state perturbations do not significantly affect trajectories for RL tasks. See Appendix D for full details. For all tasks, we set ε/LL as the desired performance level, which is the distance in the output space corresponding to an error of ε. For each task, we set a fixed desired performance level corresponding to the category of the task (i.e. a fixed error rate of 1.0% for image classification tasks and a fixed error rate of 0.001 for RL tasks); see Table 3.