Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines

Authors: Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, Gerald Tesauro, Kartik Talamadupula, Mrinmaya Sachan, Murray Campbell9018-9027

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report the results of our experiments on the TWC games.
Researcher Affiliation Collaboration 1IBM Research 2EPFL 3TTI Chicago 4ETH Zurich
Pseudocode No The paper describes the components of the framework in prose and with a block diagram (Figure 3) but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Code and data can be found at https://github.com/IBM/commonsense-rl.
Open Datasets Yes Code and data can be found at https://github.com/IBM/commonsense-rl.
Dataset Splits No The paper specifies a 'training set and two test sets' (in-distribution and out-of-distribution) but does not explicitly mention a separate 'validation' set for model training.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or types of computing resources used for the experiments.
Software Dependencies No The paper mentions various software components and models like GloVe, Numberbatch, BERT, GPT2, and spaCy, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Each agent is trained for 100 episodes and the results are averaged over 10 runs. Following one of the winning strategies in the First Text World Competition (Adolphs and Hofmann 2019), we use the Advantage Actor-Critic framework (Mnih et al. 2016) to train the agents using reward signals from the training games.