Solving Continuous Control via Q-learning

Authors: Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller, Daniela Rus, Markus Wulfmeier

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing singleagent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. [...] We evaluate performance of the Dec QN agent on several continuous control environments from the Deep Mind Control Suite (Tunyasuvunakool et al., 2020) and Meta World (Yu et al., 2020).
Researcher Affiliation Collaboration Tim Seyde MIT CSAIL Peter Werner MIT CSAIL Wilko Schwarting MIT CSAIL Igor Gilitschenski University of Toronto Martin Riedmiller Deep Mind Daniela Rus1 MIT CSAIL Markus Wulfmeier1 Deep Mind
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper states that the Acme framework and baseline agents (Dreamer-v2, Dr Q-v2) are open source, but does not provide a direct link or explicit statement that their specific implementation of Dec QN is open source within the paper's text. The project website linked in a footnote *does* provide the code, but the paper itself does not.
Open Datasets Yes We evaluate performance of the Dec QN agent on several continuous control environments from the Deep Mind Control Suite (Tunyasuvunakool et al., 2020) and Meta World (Yu et al., 2020)
Dataset Splits No The paper describes using multiple seeds for experiments and varying hyperparameters, but does not specify explicit train/validation/test dataset splits with percentages or sample counts for the continuous control tasks. In reinforcement learning, the environment often serves as the training and testing ground, but traditional dataset split information is not provided.
Hardware Specification Yes Experiments on Control Suite and Matrix Game tasks were conducted on a single NVIDIA V100 GPU with 4 CPU cores (state-based) or 20 CPU cores (pixel-based). Experiments in Meta World and Isaac Gym were conducted on a single NVIDIA 2080Ti with 4 CPU cores.
Software Dependencies No The paper mentions implementing Dec QN within the Acme framework in TensorFlow and a PyTorch version for the Mini Cheetah task, but does not provide specific version numbers for TensorFlow, PyTorch, or Acme.
Experiment Setup Yes We provide hyperparameter values of Dec QN used for benchmarking in Table 2. A constant set of hyperparameters is used throughout all experiments, with modifications to the network architecture for vision-based tasks. [...] Table 2: Dec QN hyperparameters for stateand pixel-based control. (Includes Learning rate, Batch size, Discount γ, etc.)