Memory Gym: Partially Observable Challenges to Memory-Based Agents
Authors: Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results based on Proximal Policy Optimization (PPO) and Gated Recurrent Unit (GRU) underline the strong memory dependency of the contributed environments. |
| Researcher Affiliation | Academia | 1TU Dortmund University 2Rhine-Waal University of Applied Sciences 3LIACS Universiteit Leiden |
| Pseudocode | No | The paper includes diagrams of model architecture (Figure 15) and data processing (Figure 16) but does not provide structured pseudocode or algorithm blocks explicitly labeled as such. |
| Open Source Code | Yes | Source Code: https: //github.com/Marco Meter/drl-memory-gym/ and Source Code: https://github.com/Marco Meter/episodic-transformer-memory-ppo |
| Open Datasets | Yes | We propose Memory Gym as a novel and open source benchmark consisting of three unique environments: Mortar Mayhem, Mystery Path, and Searing Spotlights. ... All environments are procedurally generated to evaluate the agent s ability to generalize to unseen levels (or seeds). |
| Dataset Splits | Yes | All training runs utilize 100,000 environment seeds. Generalization is assessed on 30 novel seeds, which are repeated 5 times. Hence, each data point aggregates 750 episodes. |
| Hardware Specification | Yes | These experiments are run on an NVIDIA A100 Tensor-Core-GPU and an AMD EPYC 7542 CPU (32 cores). |
| Software Dependencies | Yes | Memory Gym s significant dependencies are gym (Brockman et al., 2016) and Py Game. This allows Memory Gym to be easily set up and executed on Linux, mac OS, and Windows. ... # Create Anaconda environment $ conda create -n memory-gym python=3.7 --yes |
| Experiment Setup | Yes | Table 6: These are the hyperparameters that we used for all training runs. The sequence length is dynamically set by the longest episode inside the batch of the gathered training data. The experiments on Searing Spotlights and Mystery Path utilized a fixed sequence length of 128. The learning rate and the entropy coefficient decay linearly. (Table 6 then lists specific values for each hyperparameter: Training Seeds 100000, Worker Steps 512, Batch Size 16384, etc.) |