Contextual Reliability: When Different Features Matter in Different Contexts
Authors: Gaurav Rohit Ghosal, Amrith Setlur, Daniel S. Brown, Anca Dragan, Aditi Raghunathan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability. Finally, we consider a variety of semi-synthetic and real-world datasets that require contextual reliability ranging from control to image classification and motion forecasting with real-world autonomous vehicle data from the Wayo Open Motion Dataset (WOMD) (Ettinger et al., 2021). ENP offers consistent gains over baselines across all these settings, offerings gains of around 15% in control environments and 6% in image classification, and 5% on WOMD. |
| Researcher Affiliation | Academia | 1 University of California, Berkeley 2 Carnegie Mellon University 3 University of Utah. Correspondence to: Gaurav Ghosal <gauravrghosal@berkeley.edu>, Amrith Setlur <asetlur@cs.cmu.edu>. |
| Pseudocode | No | The paper describes algorithms (e.g., ENP) but does not provide them in pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to open-sourced code by other authors for baselines and base models (e.g., https://github.com/pimdh/causal-confusion/, https://github.com/stepankonev/waymo-motion-prediction-challenge-2022-multipathplus-plus), but does not state that the code for *their own* methodology (ENP) is publicly available. |
| Open Datasets | Yes | Finally, we consider a variety of semi-synthetic and real-world datasets that require contextual reliability ranging from control to image classification and motion forecasting with real-world autonomous vehicle data from the Wayo Open Motion Dataset (WOMD) (Ettinger et al., 2021). We adapt the standard Waterbirds robustness benchmark demonstrated in (Sagawa et al., 2019a) to generate a data-set where the foreground bird images are blurred and randomly cropped with probability 0.05. |
| Dataset Splits | Yes | For the ERM and GT-Aug experiments, we used the standard weight decay parameter of 1e-4 and tuned the best epoch using the validation dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | Table 4. Multi Path++ Training Hyperparameters We show the set of hyperparameters used in training all Multi Path++ models in our WOMD experiments. PARAMETER VALUE BATCH SIZE 42 LEARNING RATE 1E-4 GRADIENT NORM CLIPPING 0.4 MASK HISTORY PERCENTAGE 0.15 TOTAL TRAINING EPOCHS 120 LEARNING RATE SCHEDULER-TYPE REDUCE ON PLATEAU LEARNING RATE SCHEDULER-FACTOR 0.5 LEARNING RATE SCHEDULER-PATIENCE 20 |