A Picture of the Space of Typical Learnable Tasks
Authors: Rahul Ramesh, Jialin Mao, Itay Griniasty, Rubing Yang, Han Kheng Teoh, Mark Transtrum, James Sethna, Pratik Chaudhari
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use classification tasks constructed from the CIFAR10 and Imagenet datasets to study these phenomena. Code is available at https://github.com/grasplyrl/picture of space of tasks. Experiments in this paper required about 30,000 GPU-hours. |
| Researcher Affiliation | Academia | 1University of Pennsylvania 2Cornell University 3Brigham Young University. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/grasplyrl/picture of space of tasks. |
| Open Datasets | Yes | We performed experiments using two datasets. 1. CIFAR10 (Krizhevsky, 2009) has 10 classes... 2. Image Net (Deng et al., 2009) has 1000 classes... |
| Dataset Splits | No | The paper mentions training data and test data, but does not explicitly provide details about a validation dataset split (e.g., percentages or sample counts for a validation set). |
| Hardware Specification | No | The paper states 'Experiments in this paper required about 30,000 GPU-hours' and 'Models are trained on 4 GPUs' but does not specify the models or types of GPUs used. |
| Software Dependencies | No | The paper mentions 'FFCV (Leclerc et al., 2022)' as a data-loading library and 'Numpy s memmap functionality' but does not provide specific version numbers for common software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | All models are trained in mixed-precision (32-bit weights, 16-bit gradients) using stochastic gradient descent (SGD) with Nesterov s acceleration with momentum coefficient set to 0.9 and cosine annealing of the learning rate schedule. Batch-normalization parameters are excluded from weight decay. CIFAR10 datasets use padding (4 pixels) with random cropping to an image of size 28 28 or 32 32 respectively for data augmentation. CIFAR10 images additionally have random left/right flips for data augmentation. Images are finally normalized to have mean 0.5 and standard deviation 0.25. Supervised learning models (including fine-tuning) for CIFAR10 are trained for 100 epochs with a batch-size of 64 and weight decay of 10 5 using the Wide-Resnet. Episodic meta-learners are trained using a Wide-Resnet and with the prototypical loss (Snell et al., 2017a). For the 2-way meta-learner, each episode contains 20 query samples and 10 support samples. For the 5-way meta-learner, each episode contains 50 query samples and 10 support samples. ... Models are trained for around 750 epochs ... Models are trained for 200 epochs for 2-way classification problems and for 500 epochs when trained on the entirety of CIFAR10 with the Adam optimizer and an initial learning rate of 0.001. ... Image Net models are trained for 40 epochs with progressive resizing the image size is increased from 160 to 224 between the epochs 29 and 34. Models are trained on 4 GPUs with a batch-size of 512. The training uses two types of augmentations random-resized crop and random horizontal flips. Additionally, we use label smoothing with the smoothing parameter set to 0.1. |