Understanding Why Generalized Reweighting Does Not Improve Over ERM
Authors: Runtian Zhai, Chen Dan, J Zico Kolter, Pradeep Kumar Ravikumar
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that GRW and ERM have (almost) equivalent implicit biases, in the sense that the points they converge to are very close to each other, under a much more general setting than those used in previous work. Thus, GRW cannot improve over ERM because it does not yield a significantly different model. We are the first to extend this line of theoretical results (i) to wide neural networks, (ii) to reweighting methods with dynamic weights, (iii) to regression tasks, and (iv) to methods with L2 regularization. ... We use a simple experiment to demonstrate the correctness of this result. The experiment is conducted on a training set of six MNIST images... The results are presented in Figure 1... |
| Researcher Affiliation | Academia | Carnegie Mellon University Pittsburgh, PA, USA 15213 {rzhai,cdan,zkolter,pradeepr}@cs.cmu.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | The experiment is conducted on a training set of six MNIST images, five of which are digit 0 and one is digit 1. |
| Dataset Splits | No | The paper mentions a 'training set' but does not specify any validation set split or methodology for data partitioning into validation sets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We use a 784-dimensional linear model and run ERM, importance weighting and group DRO. ... with L2 regularization. ... Left two: µ = 0.1; Right two: µ = 10. ... run ERM, importance weighting and Group DRO on the training set with 6 MNIST images ... with the logistic loss and the polynomially-tailed loss (Eqn. (19), with α = 1, β = 0 and ℓleft being the logistic loss shifted to make the overall loss function continuous) on this dataset for 10 million epochs... |