Remember and Forget for Experience Replay
Authors: Guido Novati, Petros Koumoutsakos
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that Re F-ER consistently improves the performance of continuous-action, off-policy RL on fully observable benchmarks and partially observable flow control problems. In this section we couple Re F-ER, conventional ER and PER with one method from each of the three main classes of deep continuous-action RL algorithms: DDPG, NAF, and V-RACER. The performance of each combination of algorithms is measured on the Mu Jo Co (Todorov et al., 2012) tasks of Open AI Gym (Brockman et al., 2016) by plotting the mean cumulative reward R = P t rt. Each plot tracks the average R among all episodes entering the RM within intervals of 2 105 time steps averaged over five differently seeded training trials. |
| Researcher Affiliation | Academia | Guido Novati 1 Petros Koumoutsakos 1 1Computational Science & Engineering Laboratory, ETH Zurich, Zurich, Switzerland. Correspondence to: Guido Novati <novatig@ethz.ch>, Petros Koumoutsakos <petros@ethz.ch>. |
| Pseudocode | No | The paper describes its algorithms and methods using prose and mathematical equations but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code to reproduce all present results is available on Git Hub.1 1https://github.com/cselab/smarties |
| Open Datasets | Yes | We analyze Re F-ER on the Open AI Gym (Brockman et al., 2016) as well as fluid-dynamics simulations to show that it reliably obtains competitive results without requiring extensive HP optimization. |
| Dataset Splits | No | The paper mentions 'five differently seeded training trials' and 'tracks the average R among all episodes entering the RM within intervals of 2 105 time steps'. However, it does not explicitly define or provide percentages/counts for training, validation, or test dataset splits in the traditional supervised learning sense. |
| Hardware Specification | No | The paper states that 'Computational resources were provided by Swiss National Supercomputing Centre (CSCS) Project s658 and s929.' but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2015)' for optimization and refers to 'Open AI Gym (Brockman et al., 2016)' and 'Mu Jo Co (Todorov et al., 2012)' environments. However, it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We use A=5 10 7, C=4, D=0.1, and N=218 for all results with Re F-ER in the main text. All results were obtained with Adam (Kingma & Ba, 2015) using a batch size of 256 for all algorithms and tasks. The replay memory contains 106 experiences. For both DDPG and NAF, we include exploratory Gaussian noise in the policy πw = mw+N(0, σ2I) with σ=0.2. LSTM networks (2 layers of 32 cells with back-propagation window of 16 steps). |