reproducibilityindex.ai

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan Z. Li, Yang You

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations show that Diff Aug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
Researcher Affiliation	Collaboration	1AI Lab, Research Center for Industries of the Future, Westlake University, China 2DAMO Academy, Alibaba Group 3National University of Singapore 4Hupan Lab, Zhejiang Province.
Pseudocode	Yes	Algorithm 1 The Diff Aug Training Algorithm:
Open Source Code	Yes	The code for review is released at https://github. com/zangzelin/code_diffaug.
Open Datasets	Yes	Our experiments utilize the Genomic Benchmarks (Greˇsov a et al., 2023), encompassing datasets that target regulatory elements (such as promoters, enhancers, and open chromatin regions) from three model organisms: humans, mice, and roundworms.
Dataset Splits	No	The paper consistently describes train and test splits for evaluation, but does not explicitly provide details for a separate validation dataset split.
Hardware Specification	Yes	Table 9. Details of the training process in vision dataset. CF10 Learning Rate Weight Decay Batch Size GPU pix Training Time CF10 1 0.001 1e-6 256 1*V100 32 32 7.1 hours
Software Dependencies	No	The paper mentions general software like 'Python package' and `torch` (implied by code snippets) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The training strategy of Diff Aug is A-Step: 200 epochs ! B-Step: 400 epoch ! A-Step: 800 epoch.