reproducibilityindex.ai

Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup

Authors: Jang-Hyun Kim, Wonho Choo, Hyun Oh Song

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny Image Net, and Image Net datasets, and the source code is available at https://github.com/ snu-mllab/Puzzle Mix.We train and evaluate classiﬁers on CIFAR-100 (Krizhevsky & Geoffrey, 2009), Tiny-Image Net (Chrabaszcz et al., 2017), and Image Net (Deng et al., 2009) datasets. We ﬁrst study the generalization performance and adversarial robustness of our method (Section 6.1). Next, we show that our method can be used in conjunction with the existing augmentation method (Aug Mix) to simultaneously improve the corruption robustness and generalization performance (Section 6.2). Finally, we perform ablation studies for our method (Section 6.3).
Researcher Affiliation	Academia	Jang-Hyun Kim 1 2 Wonho Choo 1 2 Hyun Oh Song 1 2 1Department of Computer Science and Engineering, Seoul National University, Seoul, Korea 2Neural Processing Research Center. Correspondence to: Hyun Oh Song <hyunoh@snu.ac.kr>.
Pseudocode	Yes	Algorithm 1 Masked Transport Input: mask z , cost C , large value v Initialize C(0) = C , t = 0 repeat target = argmin(C(t), dim = 1) Π = 0n n for i = 0 to n 1 do Π[i, target[i]] = 1 end for Cconflict = C(t) Π + v(1 Π) source = argmin(Cconflict, dim = 0) Πwin = 0n n for j = 0 to n 1 do Πwin[source[j], j] = 1 end for Πwin = Πwin Π Πlose = (1 Πwin) Π C(t+1) = C(t) + vΠlose t = t + 1 until convergence Return: Πwin; Algorithm 2 Stochastic Adversarial Puzzle Mix Input: data x0, x1, attack ball ϵ, step τ, probability p xi,clean = xi for i = 0, 1 Sample νi B(1, p) for i = 0, 1 for i = 1, 2 do if νi == 1 then κi Uniform( ϵ, ϵ) xi xi + κi end if end for Calculate gradient xl(xi) for i = 0, 1 Optimize z and Π i in Equation (3) Sample δ Uniform(0, 1) for i = 0, 1 do if νi == 1 then κi κi + τ sign( xl(xi)) κi clip(κi, ϵ, ϵ) xi xi,clean + δ κi end if end for Return: (1 z ) Π 0 x0 + z Π 1 x1
Open Source Code	Yes	Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny Image Net, and Image Net datasets, and the source code is available at https://github.com/ snu-mllab/Puzzle Mix.
Open Datasets	Yes	We train and evaluate classiﬁers on CIFAR-100 (Krizhevsky & Geoffrey, 2009), Tiny-Image Net (Chrabaszcz et al., 2017), and Image Net (Deng et al., 2009) datasets.
Dataset Splits	No	The paper cites the datasets used (CIFAR-100, Tiny-ImageNet, ImageNet) which have standard splits, but it does not explicitly provide percentages or counts for training, validation, and test splits within the main text. It mentions 'validation' indirectly in the context of other methods, but not specifically for its own experimental setup.
Hardware Specification	No	The paper states: "All of the computations in our algorithm except α-β swap are done in mini-batch and can be performed in parallel in GPUs." However, it does not specify any particular GPU model, CPU type, or other detailed hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions: "to solve the discrete optimization problem with respect to the mask z, we use α-β swap algorithm from the py GCO python wrapper" and "for-loops in Algorithm 1 can be done in parallel by using the scatter function of Py Torch (Paszke et al., 2017)". While it names PyTorch and pyGCO, it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We follow the training protocol of Verma et al. (2019), which trains WRN28-10 for 400 epochs and Pre Act Res Net18 for 1200 epochs. Hyperparameter settings are available in Supplementary C.1.; In our experiments, we use label space L = {0, 1/2, 1}. In addition, we randomly sample the size of the graph, i.e., size of mask z, from {2x2, 4x4, 8x8, 16x16}, and down-sample the given mini-batch for all experiments.; For the mixing ratio λ, we randomly sample λ from Beta(α, α) at each mini-batch.