Overparameterisation and worst-case generalisation: friend or foe?
Authors: Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify that with such post-hoc correction, overparameterisation can improve average and worst-case performance. and Table 1 summarises the test set results on all datasets. |
| Researcher Affiliation | Industry | Aditya Krishna Menon, Ankit Singh Rawat & Sanjiv Kumar Google Research New York, NY {adityakmenon,ankitsrawat,sanjivk}@google.com |
| Pseudocode | No | The paper describes the correction procedures in prose (Sections 4.1 and 4.2) but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or provide any links to a code repository. |
| Open Datasets | Yes | In the sequel, we shall make extensive use of three datasets from Sagawa et al. (2020a;b), each of which involve binary labels y Y and a binary attribute a(x) A: (i) synth, a synthetic dataset where X R200, Y = { 1}, and A { 1}. (ii) waterbirds, a dataset of bird images with Y = {land bird, water bird} corresponding to the bird type, and A = {land background, water background} corresponding to the background. (iii) celeb A, a dataset of celebrity images with Y = {blond, dark} corresponding to individuals hair colour, and A = {male, female}. (Citing Sagawa et al. (2020a;b) for datasets.) |
| Dataset Splits | Yes | We measure both the average and worst-subgroup errors on both the train and test set, repeating each experiment 5 times. and We apply post-hoc correction to these learned models, via classifier retraining (CRT) on the learned representations, using a linear logistic regression model with subsampling of the dominant subgroups per Sagawa et al. (2020b); and threshold correction (THR) on the decision scores, using a holdout set to estimate thresholds {ta : a { 1}} that minimise the worst-subgroup error. For waterbirds, we use the holdout set from Sagawa et al. (2020a); for celeb A, we use the standard holdout set; and for synth, we construct a holdout set using 20% of the training samples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only mentions training 'models'. |
| Software Dependencies | No | The paper mentions 'Logistic Regression package in sklearn' but does not specify the version of sklearn or any other software dependencies like TensorFlow/PyTorch versions for the ResNet-50 experiments. |
| Experiment Setup | Yes | For the Res Net-50 experiments... We train the models using SGD with a momentum value of 0.9. We use a batch size of 128, weight decay 10-4, and a learning rate of decayed according to a cosine schedule. We train with a base learning rate of 10-4 for 1000 epochs on waterbirds, and a base learning rate of 10-2 for 50 epochs on celeb A. |