Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continuous Control with Action Quantization from Demonstrations

Authors: Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate this discretization strategy on three downstream task setups: Reinforcement Learning with demonstrations, Reinforcement Learning with play data (demonstrations of a human playing in an environment but not solving any specific task), and Imitation Learning.
Researcher Affiliation Collaboration 1Google Research, Brain Team 2Univ. de Lille, CNRS, Inria Scool, UMR 9189 CRISt AL.
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes and make the code available: https://github.com/ google-research/google-research/ tree/master/aquadem.
Open Datasets Yes We consider the Adroit tasks (Rajeswaran et al., 2017) represented in Figure 11, for which human demonstrations are available (25 episodes acquired using a virtual reality system). ... We consider the Robodesk tasks (Kannan et al., 2021) shown in Figure 11, for which we acquired play data. ... We evaluate the resulting algorithm on the D4RL locomotion tasks and provide performance against state-of-the-art offline RL algorithms.
Dataset Splits No The paper describes training on environment interactions and evaluating on episodes, but it does not specify explicit dataset splits (e.g., percentages or counts) specifically designated for 'validation' purposes from a fixed dataset.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions that the 'implementation is from Acme (Hoffman et al., 2020)' but does not provide specific version numbers for Acme or other key software dependencies like PyTorch or TensorFlow.
Experiment Setup Yes For all experiments, we detail the networks architectures, hyperparameters search, and training procedures in the Appendix and we provide videos of all the agents trained in the website. ... Table 3. Hyperparameter sweep for the AQua DQN agent.