Q-functionals for Value-Based Continuous Control

Authors: Samuel Lobel, Sreehari Rammohan, Bowen He, Shangqun Yu, George Konidaris

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.
Researcher Affiliation Academia 1 Brown University 2 University of Massachusetts, Amherst
Pseudocode Yes Algorithm 1 Q-functional action-evaluation / selection
Open Source Code Yes Reproducing code can be found at the linked repository1. 1Code available at https://github.com/samlobel/q functionals
Open Datasets Yes We compare these four methods on the Open AI Gym continuous control suite (Brockman et al. 2016; Todorov, Erez, and Tassa 2012).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing.
Hardware Specification Yes For a single batch of 1024 states, we evaluate an increasing number of actions on the Hopper task (action dimension of 3) for 100 iterations. We find that a rank 3 Legendre Q-functional evaluates actions roughly 3.5 times faster on a single Nvidia 2080-ti GPU than a neural network that takes in both states and action as inputs.
Software Dependencies No The paper mentions the use of Open AI Gym and standard frameworks but does not provide specific version numbers for software dependencies.
Experiment Setup Yes For all benchmark experiments, we use the Legendre basis with rank 3, and use 1,000 samples for action-selection both in bootstrapping and interaction. Details on environments and architectural choices can be found in the Appendix.