Okapi: Generalising Better by Making Statistical Matches Match

Authors: Myles Bartlett, Sara Romiti, Viktoriia Sharmanska, Novi Quadrianto

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on the WILDS 2.0 datasets [63], which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the i Wild Cam (a multi-class classification task) and Poverty Map (a regression task) image datasets as well as the Civil Comments (a binary classification task) text dataset.
Researcher Affiliation Academia Myles Bartlett1 Sara Romiti1 Viktoriia Sharmanska1,2 Novi Quadrianto1,3,4 1Predictive Analytics Lab, University of Sussex 2Imperial College London 3BCAM Severo Ochoa Strategic Lab on Trustworthy Machine Learning 4Monash University, Indonesia
Pseudocode Yes See Fig 3.2 for a pictorial representation of these steps and Appendix G for reference pseudocode. These steps are illustrated pictorially in Fig 2 and as pseudocode in Appendix G.
Open Source Code Yes Code for our paper is publicly available at https://github.com/wearepal/okapi/.
Open Datasets Yes We evaluate Okapi on three datasets taken from the WILDS 2.0 benchmark [63].
Dataset Splits Yes Following [63], we compute the mean and standard deviation (shown in parentheses) over multiple runs for both ID and OOD test sets, with these runs conducted with 3 different random seeds and 5 pre-defined cross-validation folds for i Wild Cam and Poverty Map, respectively. We attribute this partly to the high variance of the model-selection procedure (inherited from [63]) based on intermittently-computed validation performance (which does not consistently align with test performance) to determine the final model.
Hardware Specification No No, however do provide estimates of the carbon footprint for a single run of our method and of the ERM and Fix Match baselines for the i Wild Cam dataset.
Software Dependencies No Pytorch: An imperative style, high-performance deep learning library. with all models trained with a pre-trained Distil BERT [64] backbone. with us opting for a Conv Ne Xt [47] architecture over a Res Net one.
Experiment Setup Yes Yes; all implementation details, including those related to optimisation and hyperparameter-selection, are given in Appendix D.