reproducibilityindex.ai

Latent Diffusion for Language Generation

Authors: Justin Lovelace, Varsha Kishore, Chao Wan, Eliot Shekhtman, Kilian Q. Weinberger

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation. We demonstrate across multiple diverse data sets that our latent language diffusion models are significantly more effective than previous diffusion language models. Our code is available at https://github.com/justinlovelace/ latent-diffusion-for-language.
Researcher Affiliation	Academia	Justin Lovelace Varsha Kishore Chao Wan Eliot Shekhtman Kilian Q. Weinberger Cornell University, Ithaca, NY
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code	Yes	Our code is available at https://github.com/justinlovelace/ latent-diffusion-for-language.
Open Datasets	Yes	We evaluate LD4LG on a variety of natural language datasets. ROCStories [42] is a corpus of 98k five-sentence commonsense stories... The AG News Topic Classification [60] dataset consists of news articles across four topics... The XSum [44] dataset consists of BBC articles... The QQP [9] dataset consists of 400k question pairs... The WMT 2014 English-German [5] dataset is a widely used machine translation dataset...
Dataset Splits	Yes	ROCStories [42]. The dataset consists of 98,161 instances. We hold out 1,000 instances for validation, 4,000 instances for testing, and utilize the remaining 93,161 instances for training. The AG News Topic Classification [60] dataset... We hold out 1,000 instances from the training set for validation. We therefore utilize 119k training instances, 1,000 validation instances, and 7,600 test instances. XSUM [43]... It has 204,045 training instances, 11,332 validation instances, and 11,334 test instances. QQP [9]... It has 144,715 training instances, 2,048 validation instances, and 2,500 test instances. WMT 2014 English-German [5]... The validation and testing splits each have roughly 3k paired sentences.
Hardware Specification	Yes	We train all of our diffusion models with a single Nvidia A6000 GPU except for the machine translation models which are trained with 4 Nvidia A6000 GPUs. All of the models presented in this work are trained on a single Nvidia A6000 except for the Diffu Seq XSum baseline which was trained with two Nvidia A6000s.
Software Dependencies	No	The paper mentions software like 'Huggingface evaluate library' and 'Spacy (https://spacy. io/)' but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We present the training details across the different datasets in Table 14. We tuned hyperparameters using the validation MAUVE scores for the ROCStories dataset and found that they generally transferred well across datasets. We therefore used the same hyperparameters across datasets, except that we utilized the L1 loss instead of the L2 loss for the Seq2Seq tasks." and "Table 14: Training details for LD4LG across different datasets. [Includes details such as Sampling Timesteps, Noise Schedule, Regression Loss, Transformer Layers, Learning Rate, Batch Size, Warmup Steps, Weight Decay, Dropout, Gradient Clipping, Training Steps, etc.]