Coping with Label Shift via Distributionally Robust Optimisation
Authors: Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, through experiments on CIFAR-100 and Image Net, we show that our technique can significantly improve performance over a number of baselines in settings where label shift is present. |
| Researcher Affiliation | Collaboration | Jingzhao Zhang Massachusetts Institute of Technology jzhzhang@mit.edu Aditya Krishna Menon & Andreas Veit & Srinadh Bhojanapalli & Sanjiv Kumar Google Research {adityakmenon, aveit, bsrinadh, sanjivk}@mit.edu Suvrit Sra Massachusetts Institute of Technology suvrit@mit.edu |
| Pseudocode | Yes | Algorithm 1 ADVSHIFT(θ0, γc, λ, NNOpt, pemp, ηπ) |
| Open Source Code | No | The paper does not explicitly state that code is released or provide a link to a code repository. |
| Open Datasets | Yes | Finally, through experiments on CIFAR-100 and Image Net, we show that our technique can significantly improve performance over a number of baselines in settings where label shift is present. |
| Dataset Splits | Yes | First, we train a model on the training set and compute its error distribution on the validation set. Next, we pick a threshold τ on the allowable KL divergence between the train and target distribution and find the adversarial distribution within this threshold which achieves the worst-possible validation error. |
| Hardware Specification | No | The paper mentions training a 'Res Net-50' model, but it does not specify any particular hardware components like GPU models (e.g., NVIDIA A100), CPU types, or cloud computing instances with their specifications. |
| Software Dependencies | No | The paper mentions 'Tensorflow' in Appendix E ('Given that our implementation is based on Tensorflow...'), but it does not specify a version number for Tensorflow or any other key software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | To evaluate the proposed method, we use the standard image classification setup of training a Res Net50 on Image Net using SGD with momentum as the neural network optimiser. All algorithms are run for 90 epochs... We set 2γcλ = 1 in Algorithm 1 for simplicity. For learning the adversarial distribution, we only tune the adversarial learning rate ηπ. ... we clip the label-wise loss at value 2. Second, we add a constant ϵ term on the adversarial distribution to avoid the adversarial distribution reaching any of the vertices on the simplex. |