Understanding Why Generalized Reweighting Does Not Improve Over ERM

Authors: Runtian Zhai, Chen Dan, J Zico Kolter, Pradeep Kumar Ravikumar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that GRW and ERM have (almost) equivalent implicit biases, in the sense that the points they converge to are very close to each other, under a much more general setting than those used in previous work. Thus, GRW cannot improve over ERM because it does not yield a significantly different model. We are the first to extend this line of theoretical results (i) to wide neural networks, (ii) to reweighting methods with dynamic weights, (iii) to regression tasks, and (iv) to methods with L2 regularization. ... We use a simple experiment to demonstrate the correctness of this result. The experiment is conducted on a training set of six MNIST images... The results are presented in Figure 1...
Researcher Affiliation Academia Carnegie Mellon University Pittsburgh, PA, USA 15213 {rzhai,cdan,zkolter,pradeepr}@cs.cmu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes The experiment is conducted on a training set of six MNIST images, five of which are digit 0 and one is digit 1.
Dataset Splits No The paper mentions a 'training set' but does not specify any validation set split or methodology for data partitioning into validation sets.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies No The paper does not provide specific software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We use a 784-dimensional linear model and run ERM, importance weighting and group DRO. ... with L2 regularization. ... Left two: µ = 0.1; Right two: µ = 10. ... run ERM, importance weighting and Group DRO on the training set with 6 MNIST images ... with the logistic loss and the polynomially-tailed loss (Eqn. (19), with α = 1, β = 0 and ℓleft being the logistic loss shifted to make the overall loss function continuous) on this dataset for 10 million epochs...