Real-Time Reinforcement Learning
Authors: Simon Ramstedt, Chris Pal
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real time. We then use those insights to create a new algorithm Real-Time Actor-Critic (RTAC) that outperforms the existing stateof-the-art continuous control algorithm Soft Actor-Critic both in real-time and nonreal-time settings. |
| Researcher Affiliation | Collaboration | Simon Ramstedt Mila, Element AI, Université de Montréal simonramstedt@gmail.com Christopher Pal Mila, Element AI, Polytechnique Montréal christopher.pal@polymtl.ca |
| Pseudocode | No | The paper describes algorithms using mathematical equations and prose, but it does not include a distinct 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code and videos can be found at github.com/rmst/rtrl. |
| Open Datasets | Yes | We compare Real-Time Actor-Critic to Soft Actor-Critic (Haarnoja et al., 2018a) on several Open AIGym/Mu Jo Co benchmark environments (Brockman et al., 2016; Todorov et al., 2012) as well as on two Avenue autonomous driving environments with visual observations (Ibrahim et al., 2019). |
| Dataset Splits | No | The paper states that experiments were run on 'Open AIGym/Mu Jo Co benchmark environments' and 'Avenue autonomous driving environments', which are standard. However, it does not provide specific percentages or sample counts for training, validation, or test splits, nor does it cite a source for predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Open AIGym/Mu Jo Co' and 'Avenue simulator', but it does not specify exact version numbers for these or any other key software components, libraries, or programming languages. |
| Experiment Setup | Yes | The hyperparameters used can be found in Table 1. The hyperparameters used for the autonomous driving task are largely the same as for the Mu Jo Co tasks, however we used a lower entropy reward scale (0.05) and lower learning rate (0.0002). |