Continuous Control with Action Quantization from Demonstrations

Authors: Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate this discretization strategy on three downstream task setups: Reinforcement Learning with demonstrations, Reinforcement Learning with play data (demonstrations of a human playing in an environment but not solving any specific task), and Imitation Learning.
Researcher Affiliation Collaboration 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL.
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes and make the code available: https://github.com/ google-research/google-research/ tree/master/aquadem.
Open Datasets Yes We consider the Adroit tasks (Rajeswaran et al., 2017) represented in Figure 11, for which human demonstrations are available (25 episodes acquired using a virtual reality system). ... We consider the Robodesk tasks (Kannan et al., 2021) shown in Figure 11, for which we acquired play data. ... We evaluate the resulting algorithm on the D4RL locomotion tasks and provide performance against state-of-the-art offline RL algorithms.
Dataset Splits No The paper describes training on environment interactions and evaluating on episodes, but it does not specify explicit dataset splits (e.g., percentages or counts) specifically designated for 'validation' purposes from a fixed dataset.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions that the 'implementation is from Acme (Hoffman et al., 2020)' but does not provide specific version numbers for Acme or other key software dependencies like PyTorch or TensorFlow.
Experiment Setup Yes For all experiments, we detail the networks architectures, hyperparameters search, and training procedures in the Appendix and we provide videos of all the agents trained in the website. ... Table 3. Hyperparameter sweep for the AQua DQN agent.