DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan Z. Li, Yang You

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations show that Diff Aug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
Researcher Affiliation Collaboration 1AI Lab, Research Center for Industries of the Future, Westlake University, China 2DAMO Academy, Alibaba Group 3National University of Singapore 4Hupan Lab, Zhejiang Province.
Pseudocode Yes Algorithm 1 The Diff Aug Training Algorithm:
Open Source Code Yes The code for review is released at https://github. com/zangzelin/code_diffaug.
Open Datasets Yes Our experiments utilize the Genomic Benchmarks (Greˇsov a et al., 2023), encompassing datasets that target regulatory elements (such as promoters, enhancers, and open chromatin regions) from three model organisms: humans, mice, and roundworms.
Dataset Splits No The paper consistently describes train and test splits for evaluation, but does not explicitly provide details for a separate validation dataset split.
Hardware Specification Yes Table 9. Details of the training process in vision dataset. CF10 Learning Rate Weight Decay Batch Size GPU pix Training Time CF10 1 0.001 1e-6 256 1*V100 32 32 7.1 hours
Software Dependencies No The paper mentions general software like 'Python package' and `torch` (implied by code snippets) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The training strategy of Diff Aug is A-Step: 200 epochs ! B-Step: 400 epoch ! A-Step: 800 epoch.