Harnessing Hard Mixed Samples with Decoupled Regularizer

Authors: Zicheng Liu, Siyuan Li, Ge Wang, Lirong Wu, Cheng Tan, Stan Z. Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on supervised and semi-supervised learning benchmarks across seven datasets validate the effectiveness of DM as a plug-and-play module.
Researcher Affiliation Academia Zicheng Liu1,2, Siyuan Li1,2, Ge Wang1,2 Chen Tan1,2 Lirong Wu1,2 Stan Z. Li2, AI Lab, Research Center for Industries of the Future, Hangzhou, China; 1Zhejiang University; 2Westlake University; {liuzicheng; lisiyuan; wangge; tanchen; lirongwu; stan.zq.li} @westlake.edu.cn
Pseudocode No The paper describes its methods using mathematical equations and textual explanations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Source code and models are available at https://github.com/Westlake-AI/openmixup.
Open Datasets Yes This subsection evaluates performance gains of DM on six image classification benchmarks, including CIFAR-100 [25], Tiny-Image Net (Tiny) [10], Image Net-1k [48], CUB-200-2011 (CUB) [59], FGVCAircraft (Aircraft) [42]. For semi-supervised transfer learning (TL) benchmarks [71], we perform TL experiments on CUB, Aircraft, and Stanford-Cars [24] (Cars).
Dataset Splits Yes Tiny-Image Net [10] is a rescaled version of Image Net-1k, which has 10,000 training images and 10,000 validation images of 200 classes in 64 64 resolutions. Image Net-1k [26] contrains 1,281,167 training images and 50,000 validation images of 1000 classes in 224 224 resolutions.
Hardware Specification Yes Total training hours and GPU memory are collected on a single A100 GPU.
Software Dependencies No The paper mentions software like PyTorch, SGD, AdamW, LAMB optimizers, Open Mixup [32], and Torch SSL [76], but it does not specify version numbers for these software components or other dependencies such as Python or CUDA versions.
Experiment Setup Yes SGD optimizer and Cosine learning rate Scheduler [39] are used with the SGD weight decay of 0.0001, the momentum of 0.9, and the Batch size of 100; all methods train 800 epochs with the basic learning rate lr = 0.1 on CIFAR-100 and 400 epochs with lr = 0.2 on Tiny-Image Net.