Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines

Authors: Keerthiram Murugesan, Mattia Atzeni, Pavan Kapanipathi, Pushkar Shukla, Sadhana Kumaravel, Gerald Tesauro, Kartik Talamadupula, Mrinmaya Sachan, Murray Campbell9018-9027

AAAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report the results of our experiments on the TWC games.
Researcher Affiliation Collaboration 1IBM Research 2EPFL 3TTI Chicago 4ETH Zurich
Pseudocode No The paper describes the components of the framework in prose and with a block diagram (Figure 3) but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Code and data can be found at https://github.com/IBM/commonsense-rl.
Open Datasets Yes Code and data can be found at https://github.com/IBM/commonsense-rl.
Dataset Splits No The paper specifies a 'training set and two test sets' (in-distribution and out-of-distribution) but does not explicitly mention a separate 'validation' set for model training.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or types of computing resources used for the experiments.
Software Dependencies No The paper mentions various software components and models like GloVe, Numberbatch, BERT, GPT2, and spaCy, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Each agent is trained for 100 episodes and the results are averaged over 10 runs. Following one of the winning strategies in the First Text World Competition (Adolphs and Hofmann 2019), we use the Advantage Actor-Critic framework (Mnih et al. 2016) to train the agents using reward signals from the training games.