reproducibilityindex.ai

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Authors: JangHyun Kim, Wonho Choo, Hosan Jeong, Hyun Oh Song

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results compared to other mixup methods. We verify the performance of the proposed method by training classiﬁers on CIFAR-100, Tiny-Image Net, Image Net, and the Google commands dataset (Krizhevsky et al., 2009; Chrabaszcz et al., 2017; Deng et al., 2009; Warden, 2017).
Researcher Affiliation	Academia	Jang-Hyun Kim, Wonho Choo, Hosan Jeong, Hyun Oh Song Department of Computer Science and Engineering, Seoul National University Neural Processing Research Center {janghyun,wonho.choo,grazinglion,hyunoh}@mllab.snu.ac.kr
Pseudocode	Yes	Algorithm 1 Iterative submodular minimization
Open Source Code	Yes	The source code is available at https://github.com/snu-mllab/Co-Mixup.
Open Datasets	Yes	We verify the performance of the proposed method by training classiﬁers on CIFAR-100, Tiny-Image Net, Image Net, and the Google commands dataset (Krizhevsky et al., 2009; Chrabaszcz et al., 2017; Deng et al., 2009; Warden, 2017).
Dataset Splits	No	The paper mentions using well-known datasets like CIFAR-100, Tiny-Image Net, and Image Net which have standard test sets. It also mentions 'validation accuracy' for the Google commands dataset and 'Image Net validation dataset' in Appendix G. However, it does not explicitly provide the specific training/validation/test split percentages or sample counts for all datasets used, nor does it clearly cite how these splits were defined for each experiment if they were not standard.
Hardware Specification	Yes	For example, in the case of Image Net training with 16 Intel I9-9980XE CPU cores and 4 NVIDIA RTX 2080Ti GPUs, Co-Mixup training requires 0.964s per batch, whereas the vanilla training without mixup requires 0.374s per batch.
Software Dependencies	No	The paper mentions 'py GCO' (https://github.com/Borda/py GCO) and 'Tensor Flow-Probability library' but does not specify their version numbers or the versions of other core software components like Python or deep learning frameworks.
Experiment Setup	Yes	We use stochastic gradient descent with an initial learning rate of 0.2 decayed by factor 0.1 at epochs 100 and 200. We set the momentum as 0.9 and add a weight decay of 0.0001. In details, we use (β, γ, η, τ) = (0.32, 1.0, 0.05, 0.83) for all of experiments. Note that τ is normalized according to the size of inputs (n) and the ratio of the number of inputs and outputs (m/m ), and we use an isotropic Dirichlet distribution with α = 2 for prior p. For a compatibility matrix, we use ω = 0.001.