The Benefits of Mixup for Feature Learning
Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results verify our theoretical findings and demonstrate the effectiveness of the early-stopped Mixup training. |
| Researcher Affiliation | Academia | 1Department of Computer Science & Institute of Data Science, The University of Hong Kong 2Department of Statistics and Actuarial Science & Department of Mathematics, The University of Hong Kong 3Machine Learning Department, Carnegie Mellon University 4Department of Computer Science, University of California, Los Angeles. |
| Pseudocode | No | The paper describes training algorithms using mathematical equations and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include any links to a code repository. |
| Open Datasets | Yes | We conduct a proof-of-concept experiment on CIFAR-10 dataset. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and synthetic data but does not specify any training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or cloud resources) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'SGD with momentum 0.9' and 'Res Net18 model', but it does not specify version numbers for any software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For the two-layer CNN model and the training algorithm, we set network width m = 10, and conduct full-batch gradient descent with learning rate η = 0.05 and total iteration number T = 20000. ... For Res Net18 and Res Net34, we set the learning rate as 0.1; for Le Net and VGG16, we set the learning rate as 0.02 and 0.1 respectively. |