Online Control with Adversarial Disturbance for Continuous-time Linear Systems
Authors: Jingwei Li, Jing Dong, Can Chang, Baoxiang Wang, Jingzhao Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we apply our theoretical analysis to the practical training of agents. First we highlight the key difference between our algorithm and traditional online policy optimization. ... We conduct experiments on the hopper, halfcheetah, and walker2d benchmarks using the Mu Jo Co simulator [40]. |
| Researcher Affiliation | Academia | Jingwei Li IIIS, Tsinghua University Shanghai Qizhi Institute ljw22@mails.tsinghua.edu.cn Jing Dong The Chinese University of Hong Kong, Shenzhen jingdong@link.cuhk.edu.cn Can Chang IIIS, Tsinghua University cc22@mails.tsinghua.edu.cn Baoxiang Wang The Chinese University of Hong Kong, Shenzhen bxiangwang@cuhk.edu.cn Jingzhao Zhang IIIS, Tsinghua University Shanghai Qi zhi Institute jingzhaoz@mail.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Continuous two-level online control algorithm |
| Open Source Code | No | Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Our code is very simple, just use the traditional SAC algorithm with one line implement. Our main contribution is the theoretical analysis. |
| Open Datasets | Yes | We conduct experiments on the hopper, halfcheetah, and walker2d benchmarks using the Mu Jo Co simulator [40]. ... Table 1: The DR distributions of environment. |
| Dataset Splits | No | The paper discusses training and testing but does not provide specific details on training/test/validation dataset splits (e.g., percentages, sample counts, or explicit cross-validation setup). |
| Hardware Specification | Yes | We conducted experiments using NVIDIA A40 graphics card. |
| Software Dependencies | No | The paper mentions the use of 'Mu Jo Co simulator [40]' and 'SAC (Soft Actor-Critic) algorithm [31]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Therefore, in the following experiments we fix the parameter h = 3, m = 3. We train our algorithm with this parameter and standard SAC on hopper and test the performance on more environments. |