Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
Authors: JangHyun Kim, Wonho Choo, Hosan Jeong, Hyun Oh Song
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results compared to other mixup methods. We verify the performance of the proposed method by training classifiers on CIFAR-100, Tiny-Image Net, Image Net, and the Google commands dataset (Krizhevsky et al., 2009; Chrabaszcz et al., 2017; Deng et al., 2009; Warden, 2017). |
| Researcher Affiliation | Academia | Jang-Hyun Kim, Wonho Choo, Hosan Jeong, Hyun Oh Song Department of Computer Science and Engineering, Seoul National University Neural Processing Research Center {janghyun,wonho.choo,grazinglion,hyunoh}@mllab.snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Iterative submodular minimization |
| Open Source Code | Yes | The source code is available at https://github.com/snu-mllab/Co-Mixup. |
| Open Datasets | Yes | We verify the performance of the proposed method by training classifiers on CIFAR-100, Tiny-Image Net, Image Net, and the Google commands dataset (Krizhevsky et al., 2009; Chrabaszcz et al., 2017; Deng et al., 2009; Warden, 2017). |
| Dataset Splits | No | The paper mentions using well-known datasets like CIFAR-100, Tiny-Image Net, and Image Net which have standard test sets. It also mentions 'validation accuracy' for the Google commands dataset and 'Image Net validation dataset' in Appendix G. However, it does not explicitly provide the specific training/validation/test split percentages or sample counts for all datasets used, nor does it clearly cite how these splits were defined for each experiment if they were not standard. |
| Hardware Specification | Yes | For example, in the case of Image Net training with 16 Intel I9-9980XE CPU cores and 4 NVIDIA RTX 2080Ti GPUs, Co-Mixup training requires 0.964s per batch, whereas the vanilla training without mixup requires 0.374s per batch. |
| Software Dependencies | No | The paper mentions 'py GCO' (https://github.com/Borda/py GCO) and 'Tensor Flow-Probability library' but does not specify their version numbers or the versions of other core software components like Python or deep learning frameworks. |
| Experiment Setup | Yes | We use stochastic gradient descent with an initial learning rate of 0.2 decayed by factor 0.1 at epochs 100 and 200. We set the momentum as 0.9 and add a weight decay of 0.0001. In details, we use (β, γ, η, τ) = (0.32, 1.0, 0.05, 0.83) for all of experiments. Note that τ is normalized according to the size of inputs (n) and the ratio of the number of inputs and outputs (m/m ), and we use an isotropic Dirichlet distribution with α = 2 for prior p. For a compatibility matrix, we use ω = 0.001. |