reproducibilityindex.ai

Memory Gym: Partially Observable Challenges to Memory-Based Agents

Authors: Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results based on Proximal Policy Optimization (PPO) and Gated Recurrent Unit (GRU) underline the strong memory dependency of the contributed environments.
Researcher Affiliation	Academia	1TU Dortmund University 2Rhine-Waal University of Applied Sciences 3LIACS Universiteit Leiden
Pseudocode	No	The paper includes diagrams of model architecture (Figure 15) and data processing (Figure 16) but does not provide structured pseudocode or algorithm blocks explicitly labeled as such.
Open Source Code	Yes	Source Code: https: //github.com/Marco Meter/drl-memory-gym/ and Source Code: https://github.com/Marco Meter/episodic-transformer-memory-ppo
Open Datasets	Yes	We propose Memory Gym as a novel and open source benchmark consisting of three unique environments: Mortar Mayhem, Mystery Path, and Searing Spotlights. ... All environments are procedurally generated to evaluate the agent s ability to generalize to unseen levels (or seeds).
Dataset Splits	Yes	All training runs utilize 100,000 environment seeds. Generalization is assessed on 30 novel seeds, which are repeated 5 times. Hence, each data point aggregates 750 episodes.
Hardware Specification	Yes	These experiments are run on an NVIDIA A100 Tensor-Core-GPU and an AMD EPYC 7542 CPU (32 cores).
Software Dependencies	Yes	Memory Gym s significant dependencies are gym (Brockman et al., 2016) and Py Game. This allows Memory Gym to be easily set up and executed on Linux, mac OS, and Windows. ... # Create Anaconda environment $ conda create -n memory-gym python=3.7 --yes
Experiment Setup	Yes	Table 6: These are the hyperparameters that we used for all training runs. The sequence length is dynamically set by the longest episode inside the batch of the gathered training data. The experiments on Searing Spotlights and Mystery Path utilized a fixed sequence length of 128. The learning rate and the entropy coefficient decay linearly. (Table 6 then lists specific values for each hyperparameter: Training Seeds 100000, Worker Steps 512, Batch Size 16384, etc.)