Semantic HELM: A Human-Readable Memory for Reinforcement Learning

Authors: Fabian Paischer, Thomas Adler, Markus Hofmarcher, Sepp Hochreiter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train our memory mechanism on a set of partially observable environments and find that it excels on tasks that require a memory component, while mostly attaining performance on-par with strong baselines on tasks that do not. On a challenging continuous recognition task, where memorizing the past is crucial, our memory mechanism converges two orders of magnitude faster than prior methods.
Researcher Affiliation Academia Fabian Paischer 1, Thomas Adler 1, Markus Hofmarcher 2, Sepp Hochreiter 1 1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, 2 JKU LIT SAL e SPML Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria paischer@ml.jku.at
Pseudocode No The paper includes architectural diagrams and mathematical equations, but does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes We make all our code and random seeds used in our experiments, as well as obtained results publicly available at https://github.com/ml-jku/helm.
Open Datasets Yes First, we investigate in Section 3.1 whether CLIP can extract semantics of artificial scenes. Next, we train SHELM on four different environments, namely Mini Grid (Chevalier-Boisvert et al., 2018), Mini World (Chevalier-Boisvert, 2018), Avalon (Albrecht et al., 2022), and Psychlab (Leibo et al., 2018).
Dataset Splits No The paper states that "The final performance of an agent is measured in terms of mean human normalized scores on a curated set of 1000 test worlds" for Avalon. However, it does not provide explicit proportions or counts for training and validation splits for the datasets, as the data in RL environments is often generated dynamically through interaction.
Hardware Specification Yes All our experiments were run on either a single GTX1080Ti or a single A100 GPU. ... These experiments were run on a single GTX1080Ti. For Avalon, we used a single A100 for training where one run to train for 10 M interaction steps takes approximately 15 hours.
Software Dependencies No The paper mentions several software components like PPO, Transformer XL, CLIP, and refers to the Hugging Face Transformers library, but it does not specify version numbers for these or other key software dependencies such as Python, PyTorch, or TensorFlow.
Experiment Setup Yes We adapt the hyperparameter search from Paischer et al. (2022a). Particularly, we search for learning rate in {5e-4, 3e-4, 1e-5, 5e-5}, entropy coefficient in {0.05, 0.01, 0.005, 0.001}, rollout length in {32, 64, 128} for SHELM. ... Further we search over learning rate in {2.5e-4, 1e-4, 7e-5}, and the number of retrieved tokens in {1, 2, 4}. ... For SHELM on continuous-recognition we only retrieve the closest token for an observation. ... Further we use 64 actors and set the rollout size to 256.