reproducibilityindex.ai

The Intrinsic Dimension of Images and Its Impact on Learning

Authors: Phil Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We find that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images. Additionally, we find that low dimensional datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data.
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park 2Oden Institute for Computational Engineering and Sciences, University of Texas at Austin
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for our experiments may be found here.
Open Datasets	Yes	In this section, we measure the intrinsic dimensions of a number of popular datasets including MNIST (Deng, 2012), SVHN (Netzer et al., 2011), CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), MS-COCO (Lin et al., 2014), and Celeb A (Liu et al., 2015).
Dataset Splits	No	For each dataset we fix a test set of size N = 1700. For all experiments, we use the Res Net-18 (width = 64) architecture (He et al., 2016). We then train models until they fit their entire training set with increasing amounts of training samples and measure the test error. The paper specifies a fixed test set size for synthetic data, but does not explicitly provide percentages or counts for training and validation splits across all experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It only generally mentions "Computation resources were funded by the Sloan Foundation".
Software Dependencies	No	The paper mentions using Python and specific deep learning frameworks implicitly through context (e.g., ResNet-18, GANs), but does not provide specific version numbers for any software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	No	The paper mentions using the ResNet-18 architecture and training models until they fit their entire training set, but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations.