Okapi: Generalising Better by Making Statistical Matches Match
Authors: Myles Bartlett, Sara Romiti, Viktoriia Sharmanska, Novi Quadrianto
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on the WILDS 2.0 datasets [63], which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the i Wild Cam (a multi-class classification task) and Poverty Map (a regression task) image datasets as well as the Civil Comments (a binary classification task) text dataset. |
| Researcher Affiliation | Academia | Myles Bartlett1 Sara Romiti1 Viktoriia Sharmanska1,2 Novi Quadrianto1,3,4 1Predictive Analytics Lab, University of Sussex 2Imperial College London 3BCAM Severo Ochoa Strategic Lab on Trustworthy Machine Learning 4Monash University, Indonesia |
| Pseudocode | Yes | See Fig 3.2 for a pictorial representation of these steps and Appendix G for reference pseudocode. These steps are illustrated pictorially in Fig 2 and as pseudocode in Appendix G. |
| Open Source Code | Yes | Code for our paper is publicly available at https://github.com/wearepal/okapi/. |
| Open Datasets | Yes | We evaluate Okapi on three datasets taken from the WILDS 2.0 benchmark [63]. |
| Dataset Splits | Yes | Following [63], we compute the mean and standard deviation (shown in parentheses) over multiple runs for both ID and OOD test sets, with these runs conducted with 3 different random seeds and 5 pre-defined cross-validation folds for i Wild Cam and Poverty Map, respectively. We attribute this partly to the high variance of the model-selection procedure (inherited from [63]) based on intermittently-computed validation performance (which does not consistently align with test performance) to determine the final model. |
| Hardware Specification | No | No, however do provide estimates of the carbon footprint for a single run of our method and of the ERM and Fix Match baselines for the i Wild Cam dataset. |
| Software Dependencies | No | Pytorch: An imperative style, high-performance deep learning library. with all models trained with a pre-trained Distil BERT [64] backbone. with us opting for a Conv Ne Xt [47] architecture over a Res Net one. |
| Experiment Setup | Yes | Yes; all implementation details, including those related to optimisation and hyperparameter-selection, are given in Appendix D. |