Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces
Authors: Motoya Ohnishi, Masahiro Yukawa, Mikael Johansson, Masashi Sugiyama
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the validity of the presented framework through experiments. We verify the validity of the framework on the classical Mountain Car problem and a simulated inverted pendulum. |
| Researcher Affiliation | Academia | Motoya Ohnishi Keio Univ., KTH, RIKEN motoya.ohnishi@riken.jp Masahiro Yukawa Keio Univ., RIKEN yukawa@elec.keio.ac.jp Mikael Johansson KTH mikaelj@ee.kth.se Masashi Sugiyama RIKEN, Univ. Tokyo masashi.sugiyama@riken.jp |
| Pseudocode | Yes | Algorithm 1 Model-based CT-VF Approximation in RKHSs with Barrier-Certiļ¬ed Policy Updates |
| Open Source Code | No | The paper mentions using external code: 'We used the code in https://github.com/udacity/deep-reinforcement-learning/blob/master/crossentropy/CEM.ipynb offered by Udacity.' However, there is no statement or link indicating that the authors' own source code for the proposed methodology is openly available. |
| Open Datasets | Yes | We show that our CT approaches are advantageous over DT counterparts in terms of susceptibility to errors, by using Mountain Car Continuous-v0 in Open AI Gym [7] as the environment. |
| Dataset Splits | No | The paper describes the setup for simulated environments (Mountain Car, inverted pendulum) including control cycles and episode termination conditions. However, it does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and test sets) as it relies on simulation and continuous interaction rather than pre-partitioned static datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper states 'The code is based on Py Torch [31].' While PyTorch is mentioned, a specific version number is not provided, and no other software dependencies with version numbers are listed. |
| Experiment Setup | Yes | In the simulation, the control cycle (i.e., the frequency of applying control inputs and observing the states and costs) is set to 1.0 second. The observed immediate cost is given by R(x(t), u(t)) + = 1 + 0.001u2(t) + for x(t) < 0.45 and R(x(t), u(t)) + = 0.001u2(t) + for x(t) 0.45, where N(0, 0.12). Here, the barrier function is given by b(x) = 0.05 + v, which prevents the velocity from becoming lower than 0.05. Figure 3 compares the value functions learned by each method for the time intervals t = 20.0 and t = 1.0. |