reproducibilityindex.ai

Q-functionals for Value-Based Continuous Control

Authors: Samuel Lobel, Sreehari Rammohan, Bowen He, Shangqun Yu, George Konidaris

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.
Researcher Affiliation	Academia	1 Brown University 2 University of Massachusetts, Amherst
Pseudocode	Yes	Algorithm 1 Q-functional action-evaluation / selection
Open Source Code	Yes	Reproducing code can be found at the linked repository1. 1Code available at https://github.com/samlobel/q functionals
Open Datasets	Yes	We compare these four methods on the Open AI Gym continuous control suite (Brockman et al. 2016; Todorov, Erez, and Tassa 2012).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing.
Hardware Specification	Yes	For a single batch of 1024 states, we evaluate an increasing number of actions on the Hopper task (action dimension of 3) for 100 iterations. We find that a rank 3 Legendre Q-functional evaluates actions roughly 3.5 times faster on a single Nvidia 2080-ti GPU than a neural network that takes in both states and action as inputs.
Software Dependencies	No	The paper mentions the use of Open AI Gym and standard frameworks but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	For all benchmark experiments, we use the Legendre basis with rank 3, and use 1,000 samples for action-selection both in bootstrapping and interaction. Details on environments and architectural choices can be found in the Appendix.