Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hierarchical Text Generation and Planning for Strategic Dialogue
Authors: Denis Yarats, Mike Lewis
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach increases the endtask reward achieved by the model, improves the effectiveness of long-term planning using rollouts, and allows self-play reinforcement learning to improve decision making without diverging from human language. Our hierarchical latentvariable model outperforms previous work both linguistically and strategically. (Abstract) |
| Researcher Affiliation | Industry | 1Facebook AI Research, Menlo Park, CA. Correspondence to: Denis Yarats <EMAIL>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code. |
| Open Datasets | Yes | We focus on the negotiation task introduced by Lewis et al. (2017), as it possess both linguistic and reasoning challenges. Lewis et al. collected a corpus of human dialogues on a multi-issue bargaining task, where the agents must divide a collection of items of 3 different types (books, hats and balls) between them. (Section 2.1) |
| Dataset Splits | No | The paper mentions "validation perplexity" in Section 9.1 and Table 1, but does not provide specific details about the training/validation/test dataset splits (e.g., percentages, sample counts, or standard split references). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We used the following hyper-parameters:: embeddings and hidden states have 256 dimensions; for each unique agreement space A we learn 50 discrete latent message representations. During training, we optimize the parameters using RMSProp (Tieleman & Hinton, 2012) with initial learning rate 0.0005 and momentum ยต = 0.1, clipping of gradients whose L2 norm exceeds 1. We train the models for 15 epochs with mini-batch size of 16. (Section 9.1) |