reproducibilityindex.ai

Information Theoretic Regret Bounds for Online Nonlinear Control

Authors: Sham Kakade, Akshay Krishnamurthy, Kendall Lowrey, Motoya Ohnishi, Wen Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate LC3 on three domains: a set of continuous control tasks, a maze environment that requires exploration, and a dexterous manipulation task. Throughout these experiments, we use model predictive path integral control (MPPI) [Williams et al., 2017] for planning, and posterior sampling [Chapelle and Li, 2011, Russo and Van Roy, 2014] for exploration we don t implement the optimism version of LC3 as analyzed, but rather implement a Thompson sampling variation. A Bayesian regret of TS is plausible using the framework developed from Russo and Van Roy [2014]. The algorithms are implemented in the Lyceum framework under the Julia programming language [Summers et al., 2020, Bezanson et al., 2017]. Comparison algorithms provided by Wang and Ba [2019], Wang et al. [2019]. Note that these experiments use reward (negative cost) for evaluations. Further details of the experiments in this section can be found in Appendix D. Benchmark Tasks with Random Features We use some common benchmark tasks, including Mu Jo Co [Todorov et al., 2012] environments from Open AI Gym [Brockman et al., 2016]. We use Random Fourier Features (RFF) [Rahimi and Recht, 2008] to represent φ. Table 1 shows the ﬁnal performances (at 200k timesteps) of LC3 with RFFs for six environments, and includes its ranking compared to the benchmarks results from Wang et al. [2019]. We ﬁnd that LC3 consistently performs well on simple continuous control tasks, and it works well even without posterior sampling. When the dynamical complexity increases, such as with the contact-rich Hopper model, our method s performance suffers, suggesting that these scenarios require different feature representation.
Researcher Affiliation	Collaboration	Sham Kakade1,2 Akshay Krishnamurthy2 Kendall Lowrey1 1University of Washington Motoya Ohnishi1 2Microsoft Research NYC Wen Sun3 3Cornell University
Pseudocode	Yes	Algorithm 1 Lower Conﬁdence-based Continuous Control (LC3)
Open Source Code	Yes	Project page: https://sites.google.com/view/lc3algorithm/
Open Datasets	Yes	We use some common benchmark tasks, including Mu Jo Co [Todorov et al., 2012] environments from Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with specific percentages, counts, or references to predefined splits for reproducibility. It discusses “episodes” in the online learning context, but not traditional dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper mentions “Lyceum framework under the Julia programming language” and “Mu Jo Co environments from Open AI Gym”, but it does not specify concrete version numbers for these software components or any other libraries used for reproducibility.
Experiment Setup	No	The paper mentions general aspects of the experimental setup such as using MPPI for planning and RFFs for features, and refers to parameters like “regularizer λ” and “conﬁdence parameter C1”. However, it does not explicitly provide specific numerical values for hyperparameters or other detailed training configurations in the main text, stating instead that “Further details of the experiments in this section can be found in Appendix D.”