reproducibilityindex.ai

Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time

Authors: Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on natural language generation and machine translation tasks demonstrate the superior performance of our method in terms of both generation speed and sample quality compared to existing methods for discrete diffusion models.
Researcher Affiliation	Academia	Zixiang Chen Huizhuo Yuan Yongqian Li Yiwen Kou Junkai Zhang Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 {chenzx19,hzyuan,yongqianl,evankou,jkzhang,qgu}@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Sampling From DNDM
Open Source Code	Yes	Codes are available at https://github.com/ uclaml/DNDM.
Open Datasets	Yes	Datasets. We use the following three datasets to compare with the baselines for machine translation tasks: (1) IWSLT14 DE-EN (Cettolo et al., 2014)... (2) WMT14 EN-DE (Bojar et al., 2014)... and (3) WMT16 EN-RO (Bojar et al., 2016)... The natural language generation task is evaluated on two language datasets following Hoogeboom et al. (2021b): text8 and enwik8.
Dataset Splits	Yes	The train-validation-test split is fixed across all experiments for all machine translation datasets to ensure fair comparison.
Hardware Specification	Yes	For the fairness of comparison, all the experiments are conducted using a single NVIDIA RTX A6000 GPU with 48 GB memory.
Software Dependencies	No	The paper mentions 'Fair Seq (Ott et al., 2019)', 'GPT2 model', and 'GPT2-large model', but does not provide specific version numbers for key software components or libraries.
Experiment Setup	Yes	In all experiments, the batch size is chosen to be 100. For RDM and RDM-k, our hyperparameter settings follow the original paper (Zheng et al., 2023) except for the batch size... We train 12-layer Transformers for both text8 and enwik8 datasets for 500 epochs with the cosine schedule... During training, we employ a learning rate of 0.0001, a weight decay parameter of 0.99, and the Adam optimizer.