Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
Authors: Harley E Wiltzer, David Meger, Marc G. Bellemare
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of such an algorithm in a synthetic control problem. We simulate the performance of the FD-WGF Q-learning algorithm on a simple task based on a continuous MDP suggested by Munos (2004) as an example of an MDP whose value function does not satisfy the HJB equation in the usual sense. |
| Researcher Affiliation | Collaboration | 1Mc Gill University, Montreal, Canada 2Mila Quebec AI Institute 3Google Brain, Montreal, Canada 4CIFAR Fellow. |
| Pseudocode | Yes | Algorithm 1 Continuous-time distributional RL update |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a repository. |
| Open Datasets | Yes | We simulate the performance of the FD-WGF Q-learning algorithm on a simple task based on a continuous MDP suggested by Munos (2004) as an example of an MDP whose value function does not satisfy the HJB equation in the usual sense. |
| Dataset Splits | No | The paper does not specify exact percentages, sample counts, or refer to standard predefined splits for training, validation, and test datasets. |
| Hardware Specification | No | The paper does not provide specific details such as CPU/GPU models, memory, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library versions used for the implementation. |
| Experiment Setup | Yes | The discount factor is γ = 0.3, and observations occur at a frequency ω = 1k Hz. ... Algorithm 1 Continuous-time distributional RL update ... Require: WGF time parameter τ Require: Learning rate α |