reproducibilityindex.ai

Real-Time Reinforcement Learning

Authors: Simon Ramstedt, Chris Pal

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real time. We then use those insights to create a new algorithm Real-Time Actor-Critic (RTAC) that outperforms the existing stateof-the-art continuous control algorithm Soft Actor-Critic both in real-time and nonreal-time settings.
Researcher Affiliation	Collaboration	Simon Ramstedt Mila, Element AI, Université de Montréal simonramstedt@gmail.com Christopher Pal Mila, Element AI, Polytechnique Montréal christopher.pal@polymtl.ca
Pseudocode	No	The paper describes algorithms using mathematical equations and prose, but it does not include a distinct 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code and videos can be found at github.com/rmst/rtrl.
Open Datasets	Yes	We compare Real-Time Actor-Critic to Soft Actor-Critic (Haarnoja et al., 2018a) on several Open AIGym/Mu Jo Co benchmark environments (Brockman et al., 2016; Todorov et al., 2012) as well as on two Avenue autonomous driving environments with visual observations (Ibrahim et al., 2019).
Dataset Splits	No	The paper states that experiments were run on 'Open AIGym/Mu Jo Co benchmark environments' and 'Avenue autonomous driving environments', which are standard. However, it does not provide specific percentages or sample counts for training, validation, or test splits, nor does it cite a source for predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using 'Open AIGym/Mu Jo Co' and 'Avenue simulator', but it does not specify exact version numbers for these or any other key software components, libraries, or programming languages.
Experiment Setup	Yes	The hyperparameters used can be found in Table 1. The hyperparameters used for the autonomous driving task are largely the same as for the Mu Jo Co tasks, however we used a lower entropy reward scale (0.05) and lower learning rate (0.0002).