reproducibilityindex.ai

Remember and Forget for Experience Replay

Authors: Guido Novati, Petros Koumoutsakos

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We find that Re F-ER consistently improves the performance of continuous-action, off-policy RL on fully observable benchmarks and partially observable ﬂow control problems. In this section we couple Re F-ER, conventional ER and PER with one method from each of the three main classes of deep continuous-action RL algorithms: DDPG, NAF, and V-RACER. The performance of each combination of algorithms is measured on the Mu Jo Co (Todorov et al., 2012) tasks of Open AI Gym (Brockman et al., 2016) by plotting the mean cumulative reward R = P t rt. Each plot tracks the average R among all episodes entering the RM within intervals of 2 105 time steps averaged over ﬁve differently seeded training trials.
Researcher Affiliation	Academia	Guido Novati 1 Petros Koumoutsakos 1 1Computational Science & Engineering Laboratory, ETH Zurich, Zurich, Switzerland. Correspondence to: Guido Novati <novatig@ethz.ch>, Petros Koumoutsakos <petros@ethz.ch>.
Pseudocode	No	The paper describes its algorithms and methods using prose and mathematical equations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code to reproduce all present results is available on Git Hub.1 1https://github.com/cselab/smarties
Open Datasets	Yes	We analyze Re F-ER on the Open AI Gym (Brockman et al., 2016) as well as ﬂuid-dynamics simulations to show that it reliably obtains competitive results without requiring extensive HP optimization.
Dataset Splits	No	The paper mentions 'ﬁve differently seeded training trials' and 'tracks the average R among all episodes entering the RM within intervals of 2 105 time steps'. However, it does not explicitly define or provide percentages/counts for training, validation, or test dataset splits in the traditional supervised learning sense.
Hardware Specification	No	The paper states that 'Computational resources were provided by Swiss National Supercomputing Centre (CSCS) Project s658 and s929.' but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2015)' for optimization and refers to 'Open AI Gym (Brockman et al., 2016)' and 'Mu Jo Co (Todorov et al., 2012)' environments. However, it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions, or specific library versions).
Experiment Setup	Yes	We use A=5 10 7, C=4, D=0.1, and N=218 for all results with Re F-ER in the main text. All results were obtained with Adam (Kingma & Ba, 2015) using a batch size of 256 for all algorithms and tasks. The replay memory contains 106 experiences. For both DDPG and NAF, we include exploratory Gaussian noise in the policy πw = mw+N(0, σ2I) with σ=0.2. LSTM networks (2 layers of 32 cells with back-propagation window of 16 steps).