Understanding the role of importance weighting for deep learning
Authors: Da Xu, Yuting Ye, Chuanwei Ruan
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon. In this paper, we provide formal characterizations and theoretical justifications on the role of importance weighting with respect to the implicit bias of gradient descent and margin-based learning theory. We reveal both the optimization dynamics and generalization performance under deep learning models. Our work not only explains the various novel phenomenons observed for importance weighting in deep learning, but also extends to the studies where the weights are being optimized as part of the model, which applies to a number of topics under active research. The empirical results in Figure 2c provides the numerical evidence that reflects the strong effects of importance weighting on the generalization behavior. |
| Researcher Affiliation | Collaboration | Da Xu Walmart Labs Sunnyvale, CA 94086, USA Da Xu5180@gmail.com Yuting Ye Division of Biostatistics University of California, Berkeley Berkeley, CA 94720, USA yeyt@berkeley.edu Chuanwei Ruan Instacart San Francisco, CA 94107, USA Ruanchuanwei@gmail.com |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide concrete access to source code or explicitly state that it will be made available. |
| Open Datasets | No | The paper mentions generated datasets for illustrative experiments (Figure 1) but does not provide concrete access information (link, DOI, repository, or formal citation) for them to be publicly available. For example: "The generated datasets for our illustrative experiments are shown in Figure 1". |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed for reproduction. It mentions using 'testing data' (Figure 2c caption) but lacks details on validation splits: 'The generalization error on testing data (the remaining 80% of the orange class and 20% of the blue class that are not part of the down-sampling in Figure 1d) when the nonlinear model is trained under different class weights, as the training progresses;' |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. It only mentions general techniques like "gradient descent". |
| Experiment Setup | Yes | All the models are trained with gradient descent using 0.1 as learning rate. We use the exponential loss and the standard normal initialization. |