reproducibilityindex.ai

Value Iteration in Continuous Actions, States and Time

Authors: Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show in non-linear control experiments that the dynamic programming solution obtains the same quantitative performance as deep reinforcement learning methods in simulation but excels when transferred to the physical system. 5. Experiments In the following the experimental setup and results are described.
Researcher Affiliation	Collaboration	1NVIDIA 2Technical University of Darmstadt 3Technion, Israel Institute of Technology 4University of Washington 5University of Toronto & Vector Institute.
Pseudocode	Yes	Algorithm 1 Continuous Fitted Value Iteration (c FVI)
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	For the physical systems the dynamics model of the manufacturer is used (Quanser, 2018).
Dataset Splits	No	The paper specifies initial state distributions for training and evaluation but does not explicitly mention distinct training/validation/test dataset splits.
Hardware Specification	Yes	Wall-clock time on an AMD 3900X and a NVIDIA RTX 3090
Software Dependencies	No	The paper thanks open-source projects like Simu RLacra, Mushroom RL, Num Py, and Py Torch but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	In practice we treat β as a hyperparameter and select T such that the weight of the RT is exp ( βT) ≈ 10^-4. DP c FVI works bests with λ ∈ [0.85, 0.95] and RTDP c FVI with λ ∈ [0.45 0.55]. The learning curves, averaged over 5 seeds, for different n-step value targets are shown in Figure 7.