DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation
Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan Z. Li, Yang You
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations show that Diff Aug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets. |
| Researcher Affiliation | Collaboration | 1AI Lab, Research Center for Industries of the Future, Westlake University, China 2DAMO Academy, Alibaba Group 3National University of Singapore 4Hupan Lab, Zhejiang Province. |
| Pseudocode | Yes | Algorithm 1 The Diff Aug Training Algorithm: |
| Open Source Code | Yes | The code for review is released at https://github. com/zangzelin/code_diffaug. |
| Open Datasets | Yes | Our experiments utilize the Genomic Benchmarks (Greˇsov a et al., 2023), encompassing datasets that target regulatory elements (such as promoters, enhancers, and open chromatin regions) from three model organisms: humans, mice, and roundworms. |
| Dataset Splits | No | The paper consistently describes train and test splits for evaluation, but does not explicitly provide details for a separate validation dataset split. |
| Hardware Specification | Yes | Table 9. Details of the training process in vision dataset. CF10 Learning Rate Weight Decay Batch Size GPU pix Training Time CF10 1 0.001 1e-6 256 1*V100 32 32 7.1 hours |
| Software Dependencies | No | The paper mentions general software like 'Python package' and `torch` (implied by code snippets) but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The training strategy of Diff Aug is A-Step: 200 epochs ! B-Step: 400 epoch ! A-Step: 800 epoch. |