reproducibilityindex.ai

Understanding the role of importance weighting for deep learning

Authors: Da Xu, Yuting Ye, Chuanwei Ruan

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon. In this paper, we provide formal characterizations and theoretical justiﬁcations on the role of importance weighting with respect to the implicit bias of gradient descent and margin-based learning theory. We reveal both the optimization dynamics and generalization performance under deep learning models. Our work not only explains the various novel phenomenons observed for importance weighting in deep learning, but also extends to the studies where the weights are being optimized as part of the model, which applies to a number of topics under active research. The empirical results in Figure 2c provides the numerical evidence that reﬂects the strong effects of importance weighting on the generalization behavior.
Researcher Affiliation	Collaboration	Da Xu Walmart Labs Sunnyvale, CA 94086, USA Da Xu5180@gmail.com Yuting Ye Division of Biostatistics University of California, Berkeley Berkeley, CA 94720, USA yeyt@berkeley.edu Chuanwei Ruan Instacart San Francisco, CA 94107, USA Ruanchuanwei@gmail.com
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide concrete access to source code or explicitly state that it will be made available.
Open Datasets	No	The paper mentions generated datasets for illustrative experiments (Figure 1) but does not provide concrete access information (link, DOI, repository, or formal citation) for them to be publicly available. For example: "The generated datasets for our illustrative experiments are shown in Figure 1".
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed for reproduction. It mentions using 'testing data' (Figure 2c caption) but lacks details on validation splits: 'The generalization error on testing data (the remaining 80% of the orange class and 20% of the blue class that are not part of the down-sampling in Figure 1d) when the nonlinear model is trained under different class weights, as the training progresses;'
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers. It only mentions general techniques like "gradient descent".
Experiment Setup	Yes	All the models are trained with gradient descent using 0.1 as learning rate. We use the exponential loss and the standard normal initialization.