reproducibilityindex.ai

Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces

Authors: Motoya Ohnishi, Masahiro Yukawa, Mikael Johansson, Masashi Sugiyama

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the validity of the presented framework through experiments. We verify the validity of the framework on the classical Mountain Car problem and a simulated inverted pendulum.
Researcher Affiliation	Academia	Motoya Ohnishi Keio Univ., KTH, RIKEN motoya.ohnishi@riken.jp Masahiro Yukawa Keio Univ., RIKEN yukawa@elec.keio.ac.jp Mikael Johansson KTH mikaelj@ee.kth.se Masashi Sugiyama RIKEN, Univ. Tokyo masashi.sugiyama@riken.jp
Pseudocode	Yes	Algorithm 1 Model-based CT-VF Approximation in RKHSs with Barrier-Certiﬁed Policy Updates
Open Source Code	No	The paper mentions using external code: 'We used the code in https://github.com/udacity/deep-reinforcement-learning/blob/master/crossentropy/CEM.ipynb offered by Udacity.' However, there is no statement or link indicating that the authors' own source code for the proposed methodology is openly available.
Open Datasets	Yes	We show that our CT approaches are advantageous over DT counterparts in terms of susceptibility to errors, by using Mountain Car Continuous-v0 in Open AI Gym [7] as the environment.
Dataset Splits	No	The paper describes the setup for simulated environments (Mountain Car, inverted pendulum) including control cycles and episode termination conditions. However, it does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets) as it relies on simulation and continuous interaction rather than pre-partitioned static datasets.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper states 'The code is based on Py Torch [31].' While PyTorch is mentioned, a specific version number is not provided, and no other software dependencies with version numbers are listed.
Experiment Setup	Yes	In the simulation, the control cycle (i.e., the frequency of applying control inputs and observing the states and costs) is set to 1.0 second. The observed immediate cost is given by R(x(t), u(t)) + = 1 + 0.001u2(t) + for x(t) < 0.45 and R(x(t), u(t)) + = 0.001u2(t) + for x(t) 0.45, where N(0, 0.12). Here, the barrier function is given by b(x) = 0.05 + v, which prevents the velocity from becoming lower than 0.05. Figure 3 compares the value functions learned by each method for the time intervals t = 20.0 and t = 1.0.