Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation
Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan Z. Li, Yang You
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations show that Diff Aug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets. |
| Researcher Affiliation | Collaboration | 1AI Lab, Research Center for Industries of the Future, Westlake University, China 2DAMO Academy, Alibaba Group 3National University of Singapore 4Hupan Lab, Zhejiang Province. |
| Pseudocode | Yes | Algorithm 1 The Diff Aug Training Algorithm: |
| Open Source Code | Yes | The code for review is released at https://github. com/zangzelin/code_diffaug. |
| Open Datasets | Yes | Our experiments utilize the Genomic Benchmarks (Greหsov a et al., 2023), encompassing datasets that target regulatory elements (such as promoters, enhancers, and open chromatin regions) from three model organisms: humans, mice, and roundworms. |
| Dataset Splits | No | The paper consistently describes train and test splits for evaluation, but does not explicitly provide details for a separate validation dataset split. |
| Hardware Specification | Yes | Table 9. Details of the training process in vision dataset. CF10 Learning Rate Weight Decay Batch Size GPU pix Training Time CF10 1 0.001 1e-6 256 1*V100 32 32 7.1 hours |
| Software Dependencies | No | The paper mentions general software like 'Python package' and `torch` (implied by code snippets) but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The training strategy of Diff Aug is A-Step: 200 epochs ! B-Step: 400 epoch ! A-Step: 800 epoch. |