reproducibilityindex.ai

Understanding Why Generalized Reweighting Does Not Improve Over ERM

Authors: Runtian Zhai, Chen Dan, J Zico Kolter, Pradeep Kumar Ravikumar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that GRW and ERM have (almost) equivalent implicit biases, in the sense that the points they converge to are very close to each other, under a much more general setting than those used in previous work. Thus, GRW cannot improve over ERM because it does not yield a signiﬁcantly different model. We are the ﬁrst to extend this line of theoretical results (i) to wide neural networks, (ii) to reweighting methods with dynamic weights, (iii) to regression tasks, and (iv) to methods with L2 regularization. ... We use a simple experiment to demonstrate the correctness of this result. The experiment is conducted on a training set of six MNIST images... The results are presented in Figure 1...
Researcher Affiliation	Academia	Carnegie Mellon University Pittsburgh, PA, USA 15213 {rzhai,cdan,zkolter,pradeepr}@cs.cmu.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	Yes	The experiment is conducted on a training set of six MNIST images, ﬁve of which are digit 0 and one is digit 1.
Dataset Splits	No	The paper mentions a 'training set' but does not specify any validation set split or methodology for data partitioning into validation sets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies	No	The paper does not provide specific software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We use a 784-dimensional linear model and run ERM, importance weighting and group DRO. ... with L2 regularization. ... Left two: µ = 0.1; Right two: µ = 10. ... run ERM, importance weighting and Group DRO on the training set with 6 MNIST images ... with the logistic loss and the polynomially-tailed loss (Eqn. (19), with α = 1, β = 0 and ℓleft being the logistic loss shifted to make the overall loss function continuous) on this dataset for 10 million epochs...