Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time

Authors: Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on natural language generation and machine translation tasks demonstrate the superior performance of our method in terms of both generation speed and sample quality compared to existing methods for discrete diffusion models.
Researcher Affiliation Academia Zixiang Chen Huizhuo Yuan Yongqian Li Yiwen Kou Junkai Zhang Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL
Pseudocode Yes Algorithm 1 Sampling From DNDM
Open Source Code Yes Codes are available at https://github.com/ uclaml/DNDM.
Open Datasets Yes Datasets. We use the following three datasets to compare with the baselines for machine translation tasks: (1) IWSLT14 DE-EN (Cettolo et al., 2014)... (2) WMT14 EN-DE (Bojar et al., 2014)... and (3) WMT16 EN-RO (Bojar et al., 2016)... The natural language generation task is evaluated on two language datasets following Hoogeboom et al. (2021b): text8 and enwik8.
Dataset Splits Yes The train-validation-test split is fixed across all experiments for all machine translation datasets to ensure fair comparison.
Hardware Specification Yes For the fairness of comparison, all the experiments are conducted using a single NVIDIA RTX A6000 GPU with 48 GB memory.
Software Dependencies No The paper mentions 'Fair Seq (Ott et al., 2019)', 'GPT2 model', and 'GPT2-large model', but does not provide specific version numbers for key software components or libraries.
Experiment Setup Yes In all experiments, the batch size is chosen to be 100. For RDM and RDM-k, our hyperparameter settings follow the original paper (Zheng et al., 2023) except for the batch size... We train 12-layer Transformers for both text8 and enwik8 datasets for 500 epochs with the cosine schedule... During training, we employ a learning rate of 0.0001, a weight decay parameter of 0.99, and the Adam optimizer.