Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Authors: Fabian Paischer, Thomas Adler, Markus Hofmarcher, Sepp Hochreiter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train our memory mechanism on a set of partially observable environments and find that it excels on tasks that require a memory component, while mostly attaining performance on-par with strong baselines on tasks that do not. On a challenging continuous recognition task, where memorizing the past is crucial, our memory mechanism converges two orders of magnitude faster than prior methods. |
| Researcher Affiliation | Academia | Fabian Paischer 1, Thomas Adler 1, Markus Hofmarcher 2, Sepp Hochreiter 1 1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, 2 JKU LIT SAL e SPML Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria paischer@ml.jku.at |
| Pseudocode | No | The paper includes architectural diagrams and mathematical equations, but does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We make all our code and random seeds used in our experiments, as well as obtained results publicly available at https://github.com/ml-jku/helm. |
| Open Datasets | Yes | First, we investigate in Section 3.1 whether CLIP can extract semantics of artificial scenes. Next, we train SHELM on four different environments, namely Mini Grid (Chevalier-Boisvert et al., 2018), Mini World (Chevalier-Boisvert, 2018), Avalon (Albrecht et al., 2022), and Psychlab (Leibo et al., 2018). |
| Dataset Splits | No | The paper states that "The final performance of an agent is measured in terms of mean human normalized scores on a curated set of 1000 test worlds" for Avalon. However, it does not provide explicit proportions or counts for training and validation splits for the datasets, as the data in RL environments is often generated dynamically through interaction. |
| Hardware Specification | Yes | All our experiments were run on either a single GTX1080Ti or a single A100 GPU. ... These experiments were run on a single GTX1080Ti. For Avalon, we used a single A100 for training where one run to train for 10 M interaction steps takes approximately 15 hours. |
| Software Dependencies | No | The paper mentions several software components like PPO, Transformer XL, CLIP, and refers to the Hugging Face Transformers library, but it does not specify version numbers for these or other key software dependencies such as Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We adapt the hyperparameter search from Paischer et al. (2022a). Particularly, we search for learning rate in {5e-4, 3e-4, 1e-5, 5e-5}, entropy coefficient in {0.05, 0.01, 0.005, 0.001}, rollout length in {32, 64, 128} for SHELM. ... Further we search over learning rate in {2.5e-4, 1e-4, 7e-5}, and the number of retrieved tokens in {1, 2, 4}. ... For SHELM on continuous-recognition we only retrieve the closest token for an observation. ... Further we use 64 actors and set the rollout size to 256. |