reproducibilityindex.ai

Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning

Authors: Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, Huan Sun2547-2554

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results under both simulation and human evaluation show that our agent substantially outperforms non-interactive semantic parsers and rule-based agents.
Researcher Affiliation	Collaboration	The Ohio State University, University of Washington Microsoft Research AI, U.S. Army Research Lab {yao.470, sun.397}@osu.edu, {xiul, jfgao}@microsoft.com brian.m.sadler6.civ@mail.mil
Pseudocode	Yes	All policies are stochastic in that the next subtask or action is sampled according to the probability distribution which allows exploration in RL, and that the policies can be optimized using policy gradient methods. In our experiments we used the REINFORCE algorithm (Williams 1992). Details are outlined in Algorithm of the Appendix.
Open Source Code	Yes	All source code and documentations are available at https://github.com/LittleYUYU/Interactive-Semantic-Parsing.
Open Datasets	Yes	We utilize the 291,285 <recipe, description> pairs collected by Ur et al. (2016) for training and the 3,870 pairs from Quirk, Mooney, and Galley (2015) for testing.
Dataset Splits	Yes	20% of the training data are randomly sampled as a validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing resources) used for running the experiments.
Software Dependencies	No	The paper mentions models and algorithms (e.g., "Latent Attention Model (LAM)", "bidirectional GRU-RNN", "REINFORCE algorithm"), but does not specify any software dependencies with version numbers (e.g., Python packages, libraries, or frameworks).
Experiment Setup	Yes	The word vector dimension is set at 50, the weight factor wd is 0.5, and the discount factor γ is 0.99. The max turn Max Local Turn is set at 5 and Max Global Turn at 4, which allows four subtasks at most. β is a trade-off between parsing accuracy and the number of questions: A larger β trains an agent to ask fewer questions but with less accuracy, while a lower β leads to more questions and likely more accurate parses. With the validation set, we experimented with β={0.3, 0.4, 0.5}, and observed that when β = 0.3, the number of questions raised by the HRL-based agents is still reasonable compared with LAM-rule/sup, and its parsing accuracy is much higher.