Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control
Authors: Wenhan Cao, Wei Pan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These theoretical findings are finally validated by two canonical control tasks. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Manchester 2School of Vehicle and Mobility, Tsinghua University |
| Pseudocode | No | The paper describes theoretical models and iterative processes, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/anonymity678/Computation-Impacts-Control.git. |
| Open Datasets | No | The paper defines and uses "canonical linear-quadratic regulator problem" and a "canonical nonlinear system" as examples. These are problem definitions for simulation, not references to or access information for publicly available datasets with formal citations or links. |
| Dataset Splits | No | The paper defines and evaluates on specific control tasks, but it does not describe standard dataset splits like train/validation/test percentages or sample counts. |
| Hardware Specification | No | The paper mentions running "simulations" but provides no specific details about the hardware (e.g., CPU, GPU models, memory) used for these experiments. |
| Software Dependencies | No | The paper describes mathematical methods and algorithms (e.g., trapezoidal rule, BQ with Mat ern kernel, Euler method, Runge-Kutta series) but does not specify any particular software names with version numbers (e.g., Python 3.x, PyTorch 1.x, MATLAB R20xx) used in the experiments. |
| Experiment Setup | Yes | The initial policy of the PI is chosen as an admissible policy u = K0x with K0 = [0, 0, 0]. We use the trapezoidal rule and the BQ with Mat ern kernel (smoothness parameter b = 4) for evenly spaced samples with size 5 N 15 to compute the PEV step of Int RL. |