Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control

Authors: Minjun Sung, Sambhu Harimanas Karumanchi, Aditya Gahlawat, Naira Hovakimyan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of the L1-MBRL scheme, we conduct extensive numerical simulations using two baseline MBRL algorithms across multiple environments, including scenarios with action or observation noise. The results unequivocally demonstrate that the L1-MBRL scheme enhances the performance of the underlying MBRL algorithms without any redesign or retuning of the L1 controller from one scenario to another.
Researcher Affiliation Academia Minjun Sung , Sambhu H. Karumanchi , Aditya Gahlawat, Naira Hovakimyan Department of Mechanical Science & Engineering University of Illinois Urbana-Champaign Urbana, IL, 61801, USA {mjsung2,shk9,gahlawat,nhovakim}@illinois.edu
Pseudocode Yes Algorithm 1: L1 ADAPTIVE CONTROL
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes In our first experimental study, we evaluate the proposed L1-MBRL framework on five different Open AI Gym environments (Brockman et al., 2016) with varying levels of state and action complexity.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts) for their experiments.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions various frameworks and algorithms such as Open AI Gym, METRPO, and MBMF, but does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes For the Inverted Pendulum environment, we set ϵ = 1 and for Halfcheetah ϵ = 3, while for other environments, we chose ϵ = 0.3. Additionally, we selected a cutoff frequency of ω = 0.35/Ts, where Ts represents the sampling time interval of the environment.