Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup
Authors: Jang-Hyun Kim, Wonho Choo, Hyun Oh Song
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny Image Net, and Image Net datasets, and the source code is available at https://github.com/ snu-mllab/Puzzle Mix.We train and evaluate classifiers on CIFAR-100 (Krizhevsky & Geoffrey, 2009), Tiny-Image Net (Chrabaszcz et al., 2017), and Image Net (Deng et al., 2009) datasets. We first study the generalization performance and adversarial robustness of our method (Section 6.1). Next, we show that our method can be used in conjunction with the existing augmentation method (Aug Mix) to simultaneously improve the corruption robustness and generalization performance (Section 6.2). Finally, we perform ablation studies for our method (Section 6.3). |
| Researcher Affiliation | Academia | Jang-Hyun Kim 1 2 Wonho Choo 1 2 Hyun Oh Song 1 2 1Department of Computer Science and Engineering, Seoul National University, Seoul, Korea 2Neural Processing Research Center. Correspondence to: Hyun Oh Song <hyunoh@snu.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 Masked Transport Input: mask z , cost C , large value v Initialize C(0) = C , t = 0 repeat target = argmin(C(t), dim = 1) Π = 0n n for i = 0 to n 1 do Π[i, target[i]] = 1 end for Cconflict = C(t) Π + v(1 Π) source = argmin(Cconflict, dim = 0) Πwin = 0n n for j = 0 to n 1 do Πwin[source[j], j] = 1 end for Πwin = Πwin Π Πlose = (1 Πwin) Π C(t+1) = C(t) + vΠlose t = t + 1 until convergence Return: Πwin; Algorithm 2 Stochastic Adversarial Puzzle Mix Input: data x0, x1, attack ball ϵ, step τ, probability p xi,clean = xi for i = 0, 1 Sample νi B(1, p) for i = 0, 1 for i = 1, 2 do if νi == 1 then κi Uniform( ϵ, ϵ) xi xi + κi end if end for Calculate gradient xl(xi) for i = 0, 1 Optimize z and Π i in Equation (3) Sample δ Uniform(0, 1) for i = 0, 1 do if νi == 1 then κi κi + τ sign( xl(xi)) κi clip(κi, ϵ, ϵ) xi xi,clean + δ κi end if end for Return: (1 z ) Π 0 x0 + z Π 1 x1 |
| Open Source Code | Yes | Our experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny Image Net, and Image Net datasets, and the source code is available at https://github.com/ snu-mllab/Puzzle Mix. |
| Open Datasets | Yes | We train and evaluate classifiers on CIFAR-100 (Krizhevsky & Geoffrey, 2009), Tiny-Image Net (Chrabaszcz et al., 2017), and Image Net (Deng et al., 2009) datasets. |
| Dataset Splits | No | The paper cites the datasets used (CIFAR-100, Tiny-ImageNet, ImageNet) which have standard splits, but it does not explicitly provide percentages or counts for training, validation, and test splits within the main text. It mentions 'validation' indirectly in the context of other methods, but not specifically for its own experimental setup. |
| Hardware Specification | No | The paper states: "All of the computations in our algorithm except α-β swap are done in mini-batch and can be performed in parallel in GPUs." However, it does not specify any particular GPU model, CPU type, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions: "to solve the discrete optimization problem with respect to the mask z, we use α-β swap algorithm from the py GCO python wrapper" and "for-loops in Algorithm 1 can be done in parallel by using the scatter function of Py Torch (Paszke et al., 2017)". While it names PyTorch and pyGCO, it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We follow the training protocol of Verma et al. (2019), which trains WRN28-10 for 400 epochs and Pre Act Res Net18 for 1200 epochs. Hyperparameter settings are available in Supplementary C.1.; In our experiments, we use label space L = {0, 1/2, 1}. In addition, we randomly sample the size of the graph, i.e., size of mask z, from {2x2, 4x4, 8x8, 16x16}, and down-sample the given mini-batch for all experiments.; For the mixing ratio λ, we randomly sample λ from Beta(α, α) at each mini-batch. |