Context is Environment

Authors: Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Via extensive theory and experiments, we show that paying attention to context unlabeled examples as they arrive allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. Furthermore, training with context helps the model learn a better featurizer.
Researcher Affiliation Collaboration Sharut Gupta Meta AI, MIT CSAIL sharut@mit.edu Stefanie Jegelka MIT CSAIL stefje@mit.edu David Lopez-Paz, Kartik Ahuja Meta AI {dlp,kartikahuja}@meta.com
Pseudocode No The paper describes the ICRM protocol with bullet points in prose format, but it does not present it in a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Code is available at https://github.com/facebookresearch/ICRM.
Open Datasets Yes We use publicly available widely used image datasets for the purposes of benchmarking and comparison. (FEMNIST (Cohen et al., 2017) contains MNIST digits and handwritten letters...Rotated MNIST...WILDS Camelyon17 (Koh et al., 2021)...Tiny Image Net-C and CIFAR10-C (Hendrycks and Dietterich, 2019)...Imagenet-R (Hendrycks et al., 2021))
Dataset Splits Yes FEMNIST (Cohen et al., 2017) ...262 training users and 50 validation users. WILDS Camelyon17 (Koh et al., 2021) ...three hospitals contribute to the training set, a fourth is designated for validation, and the remaining hospital s data is used for testing. Imagenet-R (Hendrycks et al., 2021) ...Validation is conducted using images from embroidery, miscellaneous, and graffiti categories, while the test environments incorporate images from art, deviant art, and origami categories.
Hardware Specification Yes Each experiment was performed on 8 NVIDIA Tesla V100 GPUs with 32GB accelerator RAM for a single training run. The CPUs used were Intel Xeon E5-2698 v4 processors with 20 cores and 384GB RAM.
Software Dependencies No The paper states: "All experiments use the Py Torch deep-learning framework." However, it does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes Our model is standardized to have 12 layers, 4 attention heads, and a 128-dimensional embedding space across all datasets. All models are optimized using the Adam optimizer (Kingma and Ba, 2014). We perform a random search of 5 trials across the hyperparameter range (refer to Table 6) for each algorithm. Table 6 includes details for "Res Net learning rate", "weight decay", and "not Res Net learning rate" with default values and random distributions.