Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning
Authors: Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, Huan Sun2547-2554
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results under both simulation and human evaluation show that our agent substantially outperforms non-interactive semantic parsers and rule-based agents. |
| Researcher Affiliation | Collaboration | The Ohio State University, University of Washington Microsoft Research AI, U.S. Army Research Lab {yao.470, sun.397}@osu.edu, {xiul, jfgao}@microsoft.com brian.m.sadler6.civ@mail.mil |
| Pseudocode | Yes | All policies are stochastic in that the next subtask or action is sampled according to the probability distribution which allows exploration in RL, and that the policies can be optimized using policy gradient methods. In our experiments we used the REINFORCE algorithm (Williams 1992). Details are outlined in Algorithm of the Appendix. |
| Open Source Code | Yes | All source code and documentations are available at https://github.com/LittleYUYU/Interactive-Semantic-Parsing. |
| Open Datasets | Yes | We utilize the 291,285 <recipe, description> pairs collected by Ur et al. (2016) for training and the 3,870 pairs from Quirk, Mooney, and Galley (2015) for testing. |
| Dataset Splits | Yes | 20% of the training data are randomly sampled as a validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific computing resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions models and algorithms (e.g., "Latent Attention Model (LAM)", "bidirectional GRU-RNN", "REINFORCE algorithm"), but does not specify any software dependencies with version numbers (e.g., Python packages, libraries, or frameworks). |
| Experiment Setup | Yes | The word vector dimension is set at 50, the weight factor wd is 0.5, and the discount factor γ is 0.99. The max turn Max Local Turn is set at 5 and Max Global Turn at 4, which allows four subtasks at most. β is a trade-off between parsing accuracy and the number of questions: A larger β trains an agent to ask fewer questions but with less accuracy, while a lower β leads to more questions and likely more accurate parses. With the validation set, we experimented with β={0.3, 0.4, 0.5}, and observed that when β = 0.3, the number of questions raised by the HRL-based agents is still reasonable compared with LAM-rule/sup, and its parsing accuracy is much higher. |