Case-based reasoning for better generalization in textual reinforcement learning
Authors: Mattia Atzeni, Shehzaad Zuzar Dhuliawala, Keerthiram Murugesan, Mrinmaya Sachan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed approach consistently improves existing methods, obtains good out-of-distribution generalization, and achieves new state-of-the-art results on widely used environments. This section provides a detailed evaluation of our approach. We assess quantitatively the performance of CBR combined with existing RL approaches and we demonstrate its capability to improve sample efficiency and generalize out of the training distribution. |
| Researcher Affiliation | Collaboration | Mattia Atzeni IBM Research, EPFL atz@zurich.ibm.com Shehzaad Dhuliawala ETH Zürich shehzaad.dhuliawala@inf.ethz.ch Keerthiram Murugesan IBM Research keerthiram.murugesan@ibm.com Mrinmaya Sachan ETH Zürich mrinmaya.sachan@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1: CBR in Text-based RL |
| Open Source Code | No | The paper does not include an explicit statement about releasing its source code or provide a link to a code repository. |
| Open Datasets | Yes | We empirically verify the efficacy of our approach on Text World Commonsense (TWC) (Murugesan et al., 2021b) and Jericho (Hausknecht et al., 2020). |
| Dataset Splits | Yes | TWC allows agents to be tested on two settings: the in-distribution games, where the objects that the agent encounters in the test set are the same as the objects in the training set, and the out-of-distribution games which have no entity in common with the training set. Table 1 reports the results on TWC for the in-distribution set of games. Table 2: Test-set performance for TWC out-of-distribution games. |
| Hardware Specification | Yes | CPU Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz Memory 128GB GPUs 1 x NVIDIA Tesla k80 12 GB Disk1 100GB Disk2 600GB OS Ubuntu 18.04-64 Minimal for VSI |
| Software Dependencies | No | The paper lists the operating system 'Ubuntu 18.04-64' and mentions using a 'pre-trained BERT model', but it does not provide specific version numbers for other key software libraries or frameworks used in the experiments (e.g., PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | We set the hidden dimensionality of the model to d = 768 and we use 12 attention heads for the graph attention network, each applied to 64-dimensional inputs. We use nl = 2 seeded GAT layers for TWC and nl = 3 for Jericho. On both datasets, we apply a dropout regularization on the seeded GAT with probability of 0.1 at each layer. ... For Jericho, we set k = 3... The retriever threshold is kept constant to τ = 0.7 across all experiments. On TWC, we train the agents for 100 episodes and a maximum of 50 steps for each episode. On Jericho, as mentioned, we follow previous work and we train for 100 000 valid steps... We set the discount factor γ to 0.9 on all experiments. |