reproducibilityindex.ai

Hierarchical Text Generation and Planning for Strategic Dialogue

Authors: Denis Yarats, Mike Lewis

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our approach increases the endtask reward achieved by the model, improves the effectiveness of long-term planning using rollouts, and allows self-play reinforcement learning to improve decision making without diverging from human language. Our hierarchical latentvariable model outperforms previous work both linguistically and strategically. (Abstract)
Researcher Affiliation	Industry	1Facebook AI Research, Menlo Park, CA. Correspondence to: Denis Yarats <denisy@fb.com>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code.
Open Datasets	Yes	We focus on the negotiation task introduced by Lewis et al. (2017), as it possess both linguistic and reasoning challenges. Lewis et al. collected a corpus of human dialogues on a multi-issue bargaining task, where the agents must divide a collection of items of 3 different types (books, hats and balls) between them. (Section 2.1)
Dataset Splits	No	The paper mentions "validation perplexity" in Section 9.1 and Table 1, but does not provide specific details about the training/validation/test dataset splits (e.g., percentages, sample counts, or standard split references).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We used the following hyper-parameters:: embeddings and hidden states have 256 dimensions; for each unique agreement space A we learn 50 discrete latent message representations. During training, we optimize the parameters using RMSProp (Tieleman & Hinton, 2012) with initial learning rate 0.0005 and momentum µ = 0.1, clipping of gradients whose L2 norm exceeds 1. We train the models for 15 epochs with mini-batch size of 16. (Section 9.1)