Graph Constrained Reinforcement Learning for Natural Language Action Spaces

Authors: Prithviraj Ammanabrolu, Matthew Hausknecht

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results across a wide variety of IF games show that KG-A2C outperforms current IF agents despite the exponential increase in action space size. We then conduct an empirical study evaluating our agent across a diverse set of IF games followed by an ablation analysis studying the effectiveness of various components of our algorithm as well as its overall generalizability.
Researcher Affiliation Collaboration Prithviraj Ammanabrolu Georgia Institute of Technology raj.ammanabrolu@gatech.edu, Matthew Hausknecht Microsoft Research matthew.hausknecht@microsoft.com
Pseudocode No The paper describes the architecture and training process of KG-A2C, including mathematical formulations, but does not present a formal pseudocode block or algorithm steps.
Open Source Code Yes 1Code available at https://github.com/rajammanabrolu/KG-A2C
Open Datasets Yes This method predicts the most likely sequence of subword tokens for a given input using a unigram language model which, in our case, is trained on a dataset of human playthroughs of IF games3 and contains a total vocabulary of size 8000. 3http://www.allthingsjacq.com/interactive_fiction.html#clubfloyd
Dataset Splits No The paper mentions that 'Hyperparameters for all agents were tuned on the game of Zork1', implying a validation process, but it does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions tools and methods such as 'Stanford s Open Information Extraction (Open IE)' and adapting 'Advantage Actor Critic (A2C) method', but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Episodes are terminated after 100 valid steps or game over/victory. All A2C based agents are trained using data collected from 32 parallel environments. Hyperparameters for all agents were tuned on the game of Zork1 and held constant across all other games. Hyperparameters are provided in Appendix C.