Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

Authors: Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Chen-Sheng Gu, Ling-Zhen Li, Ray-I Chang, Hung-yi Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to verify the performance of our Meta-Diffu B on four benchmark Seq2Seq datasets [48, 6, 17, 8].
Researcher Affiliation Collaboration Yun-Yen Chuang1,2, Hung-Min Hsu3, Kevin Lin4, Chen-Sheng Gu1,2, Ling Zhen Li1,2, Ray-I Chang2, Hung-yi Lee2 1Maxora AI 2National Taiwan University 3University of Washington 4Microsoft
Pseudocode Yes Algorithm 1 Meta-Diffu B
Open Source Code Yes 1Code and datasets for Meta-Diffu B are available at: https://github.com/Meta-Diffu B/ Meta-Diffu B.
Open Datasets Yes In our experiment, we use four datasets: the Commonsense Conversation dataset (CC) [48], the Quasar-T dataset (QT) [6], the Wiki-Auto dataset (WA) [17], and the Quora Question Pairs dataset (QQP) [8].
Dataset Splits Yes The training set contains 3,382,137 pairs, the development set has 2,048, and the test set includes 10,000 pairs.
Hardware Specification Yes Experiments are conducted on NVIDIA A100 Tensor Core GPUs, utilizing 4 GPUs for training and a single GPU for inference.
Software Dependencies No The paper mentions general software components like 'Transformer model' and 'LSTM' but does not provide specific version numbers for any libraries or dependencies.
Experiment Setup Yes The diffusion step count is set at 2,000, and the maximum sequence length is 128. The Minimum Bayes risk (MBR) [23] decoding size, denoted as |S|, is 10; this involves generating sentences from 10 random seeds and selecting the best output sequence. The total batch size for both training and testing phases is 2048.