Hierarchical Text Generation and Planning for Strategic Dialogue
Authors: Denis Yarats, Mike Lewis
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach increases the endtask reward achieved by the model, improves the effectiveness of long-term planning using rollouts, and allows self-play reinforcement learning to improve decision making without diverging from human language. Our hierarchical latentvariable model outperforms previous work both linguistically and strategically. (Abstract) |
| Researcher Affiliation | Industry | 1Facebook AI Research, Menlo Park, CA. Correspondence to: Denis Yarats <denisy@fb.com>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code. |
| Open Datasets | Yes | We focus on the negotiation task introduced by Lewis et al. (2017), as it possess both linguistic and reasoning challenges. Lewis et al. collected a corpus of human dialogues on a multi-issue bargaining task, where the agents must divide a collection of items of 3 different types (books, hats and balls) between them. (Section 2.1) |
| Dataset Splits | No | The paper mentions "validation perplexity" in Section 9.1 and Table 1, but does not provide specific details about the training/validation/test dataset splits (e.g., percentages, sample counts, or standard split references). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We used the following hyper-parameters:: embeddings and hidden states have 256 dimensions; for each unique agreement space A we learn 50 discrete latent message representations. During training, we optimize the parameters using RMSProp (Tieleman & Hinton, 2012) with initial learning rate 0.0005 and momentum ยต = 0.1, clipping of gradients whose L2 norm exceeds 1. We train the models for 15 epochs with mini-batch size of 16. (Section 9.1) |