reproducibilityindex.ai

Context is Environment

Authors: Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Via extensive theory and experiments, we show that paying attention to context unlabeled examples as they arrive allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. Furthermore, training with context helps the model learn a better featurizer.
Researcher Affiliation	Collaboration	Sharut Gupta Meta AI, MIT CSAIL sharut@mit.edu Stefanie Jegelka MIT CSAIL stefje@mit.edu David Lopez-Paz, Kartik Ahuja Meta AI {dlp,kartikahuja}@meta.com
Pseudocode	No	The paper describes the ICRM protocol with bullet points in prose format, but it does not present it in a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code	Yes	Code is available at https://github.com/facebookresearch/ICRM.
Open Datasets	Yes	We use publicly available widely used image datasets for the purposes of benchmarking and comparison. (FEMNIST (Cohen et al., 2017) contains MNIST digits and handwritten letters...Rotated MNIST...WILDS Camelyon17 (Koh et al., 2021)...Tiny Image Net-C and CIFAR10-C (Hendrycks and Dietterich, 2019)...Imagenet-R (Hendrycks et al., 2021))
Dataset Splits	Yes	FEMNIST (Cohen et al., 2017) ...262 training users and 50 validation users. WILDS Camelyon17 (Koh et al., 2021) ...three hospitals contribute to the training set, a fourth is designated for validation, and the remaining hospital s data is used for testing. Imagenet-R (Hendrycks et al., 2021) ...Validation is conducted using images from embroidery, miscellaneous, and graffiti categories, while the test environments incorporate images from art, deviant art, and origami categories.
Hardware Specification	Yes	Each experiment was performed on 8 NVIDIA Tesla V100 GPUs with 32GB accelerator RAM for a single training run. The CPUs used were Intel Xeon E5-2698 v4 processors with 20 cores and 384GB RAM.
Software Dependencies	No	The paper states: "All experiments use the Py Torch deep-learning framework." However, it does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	Our model is standardized to have 12 layers, 4 attention heads, and a 128-dimensional embedding space across all datasets. All models are optimized using the Adam optimizer (Kingma and Ba, 2014). We perform a random search of 5 trials across the hyperparameter range (refer to Table 6) for each algorithm. Table 6 includes details for "Res Net learning rate", "weight decay", and "not Res Net learning rate" with default values and random distributions.