Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces

Authors: Motoya Ohnishi, Masahiro Yukawa, Mikael Johansson, Masashi Sugiyama

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the validity of the presented framework through experiments. We verify the validity of the framework on the classical Mountain Car problem and a simulated inverted pendulum.
Researcher Affiliation Academia Motoya Ohnishi Keio Univ., KTH, RIKEN motoya.ohnishi@riken.jp Masahiro Yukawa Keio Univ., RIKEN yukawa@elec.keio.ac.jp Mikael Johansson KTH mikaelj@ee.kth.se Masashi Sugiyama RIKEN, Univ. Tokyo masashi.sugiyama@riken.jp
Pseudocode Yes Algorithm 1 Model-based CT-VF Approximation in RKHSs with Barrier-Certified Policy Updates
Open Source Code No The paper mentions using external code: 'We used the code in https://github.com/udacity/deep-reinforcement-learning/blob/master/crossentropy/CEM.ipynb offered by Udacity.' However, there is no statement or link indicating that the authors' own source code for the proposed methodology is openly available.
Open Datasets Yes We show that our CT approaches are advantageous over DT counterparts in terms of susceptibility to errors, by using Mountain Car Continuous-v0 in Open AI Gym [7] as the environment.
Dataset Splits No The paper describes the setup for simulated environments (Mountain Car, inverted pendulum) including control cycles and episode termination conditions. However, it does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets) as it relies on simulation and continuous interaction rather than pre-partitioned static datasets.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper states 'The code is based on Py Torch [31].' While PyTorch is mentioned, a specific version number is not provided, and no other software dependencies with version numbers are listed.
Experiment Setup Yes In the simulation, the control cycle (i.e., the frequency of applying control inputs and observing the states and costs) is set to 1.0 second. The observed immediate cost is given by R(x(t), u(t)) + = 1 + 0.001u2(t) + for x(t) < 0.45 and R(x(t), u(t)) + = 0.001u2(t) + for x(t) 0.45, where N(0, 0.12). Here, the barrier function is given by b(x) = 0.05 + v, which prevents the velocity from becoming lower than 0.05. Figure 3 compares the value functions learned by each method for the time intervals t = 20.0 and t = 1.0.