Case-based reasoning for better generalization in textual reinforcement learning

Authors: Mattia Atzeni, Shehzaad Zuzar Dhuliawala, Keerthiram Murugesan, Mrinmaya Sachan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the proposed approach consistently improves existing methods, obtains good out-of-distribution generalization, and achieves new state-of-the-art results on widely used environments. This section provides a detailed evaluation of our approach. We assess quantitatively the performance of CBR combined with existing RL approaches and we demonstrate its capability to improve sample efficiency and generalize out of the training distribution.
Researcher Affiliation Collaboration Mattia Atzeni IBM Research, EPFL atz@zurich.ibm.com Shehzaad Dhuliawala ETH Zürich shehzaad.dhuliawala@inf.ethz.ch Keerthiram Murugesan IBM Research keerthiram.murugesan@ibm.com Mrinmaya Sachan ETH Zürich mrinmaya.sachan@inf.ethz.ch
Pseudocode Yes Algorithm 1: CBR in Text-based RL
Open Source Code No The paper does not include an explicit statement about releasing its source code or provide a link to a code repository.
Open Datasets Yes We empirically verify the efficacy of our approach on Text World Commonsense (TWC) (Murugesan et al., 2021b) and Jericho (Hausknecht et al., 2020).
Dataset Splits Yes TWC allows agents to be tested on two settings: the in-distribution games, where the objects that the agent encounters in the test set are the same as the objects in the training set, and the out-of-distribution games which have no entity in common with the training set. Table 1 reports the results on TWC for the in-distribution set of games. Table 2: Test-set performance for TWC out-of-distribution games.
Hardware Specification Yes CPU Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz Memory 128GB GPUs 1 x NVIDIA Tesla k80 12 GB Disk1 100GB Disk2 600GB OS Ubuntu 18.04-64 Minimal for VSI
Software Dependencies No The paper lists the operating system 'Ubuntu 18.04-64' and mentions using a 'pre-trained BERT model', but it does not provide specific version numbers for other key software libraries or frameworks used in the experiments (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup Yes We set the hidden dimensionality of the model to d = 768 and we use 12 attention heads for the graph attention network, each applied to 64-dimensional inputs. We use nl = 2 seeded GAT layers for TWC and nl = 3 for Jericho. On both datasets, we apply a dropout regularization on the seeded GAT with probability of 0.1 at each layer. ... For Jericho, we set k = 3... The retriever threshold is kept constant to τ = 0.7 across all experiments. On TWC, we train the agents for 100 episodes and a maximum of 50 steps for each episode. On Jericho, as mentioned, we follow previous work and we train for 100 000 valid steps... We set the discount factor γ to 0.9 on all experiments.