Q-functionals for Value-Based Continuous Control
Authors: Samuel Lobel, Sreehari Rammohan, Bowen He, Shangqun Yu, George Konidaris
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks. |
| Researcher Affiliation | Academia | 1 Brown University 2 University of Massachusetts, Amherst |
| Pseudocode | Yes | Algorithm 1 Q-functional action-evaluation / selection |
| Open Source Code | Yes | Reproducing code can be found at the linked repository1. 1Code available at https://github.com/samlobel/q functionals |
| Open Datasets | Yes | We compare these four methods on the Open AI Gym continuous control suite (Brockman et al. 2016; Todorov, Erez, and Tassa 2012). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing. |
| Hardware Specification | Yes | For a single batch of 1024 states, we evaluate an increasing number of actions on the Hopper task (action dimension of 3) for 100 iterations. We find that a rank 3 Legendre Q-functional evaluates actions roughly 3.5 times faster on a single Nvidia 2080-ti GPU than a neural network that takes in both states and action as inputs. |
| Software Dependencies | No | The paper mentions the use of Open AI Gym and standard frameworks but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | For all benchmark experiments, we use the Legendre basis with rank 3, and use 1,000 samples for action-selection both in bootstrapping and interaction. Details on environments and architectural choices can be found in the Appendix. |