Value Iteration in Continuous Actions, States and Time
Authors: Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in non-linear control experiments that the dynamic programming solution obtains the same quantitative performance as deep reinforcement learning methods in simulation but excels when transferred to the physical system. 5. Experiments In the following the experimental setup and results are described. |
| Researcher Affiliation | Collaboration | 1NVIDIA 2Technical University of Darmstadt 3Technion, Israel Institute of Technology 4University of Washington 5University of Toronto & Vector Institute. |
| Pseudocode | Yes | Algorithm 1 Continuous Fitted Value Iteration (c FVI) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | For the physical systems the dynamics model of the manufacturer is used (Quanser, 2018). |
| Dataset Splits | No | The paper specifies initial state distributions for training and evaluation but does not explicitly mention distinct training/validation/test dataset splits. |
| Hardware Specification | Yes | Wall-clock time on an AMD 3900X and a NVIDIA RTX 3090 |
| Software Dependencies | No | The paper thanks open-source projects like Simu RLacra, Mushroom RL, Num Py, and Py Torch but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | In practice we treat β as a hyperparameter and select T such that the weight of the RT is exp ( βT) ≈ 10^-4. DP c FVI works bests with λ ∈ [0.85, 0.95] and RTDP c FVI with λ ∈ [0.45 0.55]. The learning curves, averaged over 5 seeds, for different n-step value targets are shown in Figure 7. |