Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning

Authors: Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, Huan Sun2547-2554

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results under both simulation and human evaluation show that our agent substantially outperforms non-interactive semantic parsers and rule-based agents.
Researcher Affiliation Collaboration The Ohio State University, University of Washington Microsoft Research AI, U.S. Army Research Lab {yao.470, sun.397}@osu.edu, {xiul, jfgao}@microsoft.com brian.m.sadler6.civ@mail.mil
Pseudocode Yes All policies are stochastic in that the next subtask or action is sampled according to the probability distribution which allows exploration in RL, and that the policies can be optimized using policy gradient methods. In our experiments we used the REINFORCE algorithm (Williams 1992). Details are outlined in Algorithm of the Appendix.
Open Source Code Yes All source code and documentations are available at https://github.com/LittleYUYU/Interactive-Semantic-Parsing.
Open Datasets Yes We utilize the 291,285 <recipe, description> pairs collected by Ur et al. (2016) for training and the 3,870 pairs from Quirk, Mooney, and Galley (2015) for testing.
Dataset Splits Yes 20% of the training data are randomly sampled as a validation set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing resources) used for running the experiments.
Software Dependencies No The paper mentions models and algorithms (e.g., "Latent Attention Model (LAM)", "bidirectional GRU-RNN", "REINFORCE algorithm"), but does not specify any software dependencies with version numbers (e.g., Python packages, libraries, or frameworks).
Experiment Setup Yes The word vector dimension is set at 50, the weight factor wd is 0.5, and the discount factor γ is 0.99. The max turn Max Local Turn is set at 5 and Max Global Turn at 4, which allows four subtasks at most. β is a trade-off between parsing accuracy and the number of questions: A larger β trains an agent to ask fewer questions but with less accuracy, while a lower β leads to more questions and likely more accurate parses. With the validation set, we experimented with β={0.3, 0.4, 0.5}, and observed that when β = 0.3, the number of questions raised by the HRL-based agents is still reasonable compared with LAM-rule/sup, and its parsing accuracy is much higher.