Hierarchical Text Generation and Planning for Strategic Dialogue

Authors: Denis Yarats, Mike Lewis

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our approach increases the endtask reward achieved by the model, improves the effectiveness of long-term planning using rollouts, and allows self-play reinforcement learning to improve decision making without diverging from human language. Our hierarchical latentvariable model outperforms previous work both linguistically and strategically. (Abstract)
Researcher Affiliation Industry 1Facebook AI Research, Menlo Park, CA. Correspondence to: Denis Yarats <denisy@fb.com>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code.
Open Datasets Yes We focus on the negotiation task introduced by Lewis et al. (2017), as it possess both linguistic and reasoning challenges. Lewis et al. collected a corpus of human dialogues on a multi-issue bargaining task, where the agents must divide a collection of items of 3 different types (books, hats and balls) between them. (Section 2.1)
Dataset Splits No The paper mentions "validation perplexity" in Section 9.1 and Table 1, but does not provide specific details about the training/validation/test dataset splits (e.g., percentages, sample counts, or standard split references).
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We used the following hyper-parameters:: embeddings and hidden states have 256 dimensions; for each unique agreement space A we learn 50 discrete latent message representations. During training, we optimize the parameters using RMSProp (Tieleman & Hinton, 2012) with initial learning rate 0.0005 and momentum ยต = 0.1, clipping of gradients whose L2 norm exceeds 1. We train the models for 15 epochs with mini-batch size of 16. (Section 9.1)