Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines
Authors: Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, Gerald Tesauro, Kartik Talamadupula, Mrinmaya Sachan, Murray Campbell9018-9027
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report the results of our experiments on the TWC games. |
| Researcher Affiliation | Collaboration | 1IBM Research 2EPFL 3TTI Chicago 4ETH Zurich |
| Pseudocode | No | The paper describes the components of the framework in prose and with a block diagram (Figure 3) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data can be found at https://github.com/IBM/commonsense-rl. |
| Open Datasets | Yes | Code and data can be found at https://github.com/IBM/commonsense-rl. |
| Dataset Splits | No | The paper specifies a 'training set and two test sets' (in-distribution and out-of-distribution) but does not explicitly mention a separate 'validation' set for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or types of computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions various software components and models like GloVe, Numberbatch, BERT, GPT2, and spaCy, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Each agent is trained for 100 episodes and the results are averaged over 10 runs. Following one of the winning strategies in the First Text World Competition (Adolphs and Hofmann 2019), we use the Advantage Actor-Critic framework (Mnih et al. 2016) to train the agents using reward signals from the training games. |