Value Iteration in Continuous Actions, States and Time

Authors: Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show in non-linear control experiments that the dynamic programming solution obtains the same quantitative performance as deep reinforcement learning methods in simulation but excels when transferred to the physical system. 5. Experiments In the following the experimental setup and results are described.
Researcher Affiliation Collaboration 1NVIDIA 2Technical University of Darmstadt 3Technion, Israel Institute of Technology 4University of Washington 5University of Toronto & Vector Institute.
Pseudocode Yes Algorithm 1 Continuous Fitted Value Iteration (c FVI)
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes For the physical systems the dynamics model of the manufacturer is used (Quanser, 2018).
Dataset Splits No The paper specifies initial state distributions for training and evaluation but does not explicitly mention distinct training/validation/test dataset splits.
Hardware Specification Yes Wall-clock time on an AMD 3900X and a NVIDIA RTX 3090
Software Dependencies No The paper thanks open-source projects like Simu RLacra, Mushroom RL, Num Py, and Py Torch but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes In practice we treat β as a hyperparameter and select T such that the weight of the RT is exp ( βT) ≈ 10^-4. DP c FVI works bests with λ ∈ [0.85, 0.95] and RTDP c FVI with λ ∈ [0.45 0.55]. The learning curves, averaged over 5 seeds, for different n-step value targets are shown in Figure 7.