Information Theoretic Regret Bounds for Online Nonlinear Control

Authors: Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LC3 on three domains: a set of continuous control tasks, a maze environment that requires exploration, and a dexterous manipulation task. Throughout these experiments, we use model predictive path integral control (MPPI) [Williams et al., 2017] for planning, and posterior sampling [Chapelle and Li, 2011, Russo and Van Roy, 2014] for exploration we don t implement the optimism version of LC3 as analyzed, but rather implement a Thompson sampling variation. A Bayesian regret of TS is plausible using the framework developed from Russo and Van Roy [2014]. The algorithms are implemented in the Lyceum framework under the Julia programming language [Summers et al., 2020, Bezanson et al., 2017]. Comparison algorithms provided by Wang and Ba [2019], Wang et al. [2019]. Note that these experiments use reward (negative cost) for evaluations. Further details of the experiments in this section can be found in Appendix D. Benchmark Tasks with Random Features We use some common benchmark tasks, including Mu Jo Co [Todorov et al., 2012] environments from Open AI Gym [Brockman et al., 2016]. We use Random Fourier Features (RFF) [Rahimi and Recht, 2008] to represent φ. Table 1 shows the final performances (at 200k timesteps) of LC3 with RFFs for six environments, and includes its ranking compared to the benchmarks results from Wang et al. [2019]. We find that LC3 consistently performs well on simple continuous control tasks, and it works well even without posterior sampling. When the dynamical complexity increases, such as with the contact-rich Hopper model, our method s performance suffers, suggesting that these scenarios require different feature representation.
Researcher Affiliation Collaboration Sham Kakade1,2 Akshay Krishnamurthy2 Kendall Lowrey1 1University of Washington Motoya Ohnishi1 2Microsoft Research NYC Wen Sun3 3Cornell University
Pseudocode Yes Algorithm 1 Lower Confidence-based Continuous Control (LC3)
Open Source Code Yes Project page: https://sites.google.com/view/lc3algorithm/
Open Datasets Yes We use some common benchmark tasks, including Mu Jo Co [Todorov et al., 2012] environments from Open AI Gym [Brockman et al., 2016].
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages, counts, or references to predefined splits for reproducibility. It discusses “episodes” in the online learning context, but not traditional dataset splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running its experiments.
Software Dependencies No The paper mentions “Lyceum framework under the Julia programming language” and “Mu Jo Co environments from Open AI Gym”, but it does not specify concrete version numbers for these software components or any other libraries used for reproducibility.
Experiment Setup No The paper mentions general aspects of the experimental setup such as using MPPI for planning and RFFs for features, and refers to parameters like “regularizer λ” and “confidence parameter C1”. However, it does not explicitly provide specific numerical values for hyperparameters or other detailed training configurations in the main text, stating instead that “Further details of the experiments in this section can be found in Appendix D.”