Contextual Reliability: When Different Features Matter in Different Contexts

Authors: Gaurav Rohit Ghosal, Amrith Setlur, Daniel S. Brown, Anca Dragan, Aditi Raghunathan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability. Finally, we consider a variety of semi-synthetic and real-world datasets that require contextual reliability ranging from control to image classification and motion forecasting with real-world autonomous vehicle data from the Wayo Open Motion Dataset (WOMD) (Ettinger et al., 2021). ENP offers consistent gains over baselines across all these settings, offerings gains of around 15% in control environments and 6% in image classification, and 5% on WOMD.
Researcher Affiliation Academia 1 University of California, Berkeley 2 Carnegie Mellon University 3 University of Utah. Correspondence to: Gaurav Ghosal <gauravrghosal@berkeley.edu>, Amrith Setlur <asetlur@cs.cmu.edu>.
Pseudocode No The paper describes algorithms (e.g., ENP) but does not provide them in pseudocode or algorithm blocks.
Open Source Code No The paper refers to open-sourced code by other authors for baselines and base models (e.g., https://github.com/pimdh/causal-confusion/, https://github.com/stepankonev/waymo-motion-prediction-challenge-2022-multipathplus-plus), but does not state that the code for *their own* methodology (ENP) is publicly available.
Open Datasets Yes Finally, we consider a variety of semi-synthetic and real-world datasets that require contextual reliability ranging from control to image classification and motion forecasting with real-world autonomous vehicle data from the Wayo Open Motion Dataset (WOMD) (Ettinger et al., 2021). We adapt the standard Waterbirds robustness benchmark demonstrated in (Sagawa et al., 2019a) to generate a data-set where the foreground bird images are blurred and randomly cropped with probability 0.05.
Dataset Splits Yes For the ERM and GT-Aug experiments, we used the standard weight decay parameter of 1e-4 and tuned the best epoch using the validation dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes Table 4. Multi Path++ Training Hyperparameters We show the set of hyperparameters used in training all Multi Path++ models in our WOMD experiments. PARAMETER VALUE BATCH SIZE 42 LEARNING RATE 1E-4 GRADIENT NORM CLIPPING 0.4 MASK HISTORY PERCENTAGE 0.15 TOTAL TRAINING EPOCHS 120 LEARNING RATE SCHEDULER-TYPE REDUCE ON PLATEAU LEARNING RATE SCHEDULER-FACTOR 0.5 LEARNING RATE SCHEDULER-PATIENCE 20