Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
Authors: Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Chen-Sheng Gu, Ling-Zhen Li, Ray-I Chang, Hung-yi Lee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to verify the performance of our Meta-Diffu B on four benchmark Seq2Seq datasets [48, 6, 17, 8]. |
| Researcher Affiliation | Collaboration | Yun-Yen Chuang1,2, Hung-Min Hsu3, Kevin Lin4, Chen-Sheng Gu1,2, Ling Zhen Li1,2, Ray-I Chang2, Hung-yi Lee2 1Maxora AI 2National Taiwan University 3University of Washington 4Microsoft |
| Pseudocode | Yes | Algorithm 1 Meta-Diffu B |
| Open Source Code | Yes | 1Code and datasets for Meta-Diffu B are available at: https://github.com/Meta-Diffu B/ Meta-Diffu B. |
| Open Datasets | Yes | In our experiment, we use four datasets: the Commonsense Conversation dataset (CC) [48], the Quasar-T dataset (QT) [6], the Wiki-Auto dataset (WA) [17], and the Quora Question Pairs dataset (QQP) [8]. |
| Dataset Splits | Yes | The training set contains 3,382,137 pairs, the development set has 2,048, and the test set includes 10,000 pairs. |
| Hardware Specification | Yes | Experiments are conducted on NVIDIA A100 Tensor Core GPUs, utilizing 4 GPUs for training and a single GPU for inference. |
| Software Dependencies | No | The paper mentions general software components like 'Transformer model' and 'LSTM' but does not provide specific version numbers for any libraries or dependencies. |
| Experiment Setup | Yes | The diffusion step count is set at 2,000, and the maximum sequence length is 128. The Minimum Bayes risk (MBR) [23] decoding size, denoted as |S|, is 10; this involves generating sentences from 10 random seeds and selecting the best output sequence. The total batch size for both training and testing phases is 2048. |