FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems

Authors: Matthieu Blanke, Marc Lelarge

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run several experiments to validate our method. Our code and a demonstration video are available at https://github.com/MB-29/exploration. More details about the experiments can be found in Appendix B.
Researcher Affiliation Academia Matthieu Blanke 1 Marc Lelarge 1 1INRIA, DI ENS, PSL Research University. Correspondence to: Matthieu Blanke <matthieu.blanke@inria.fr>.
Pseudocode Yes Algorithm 1 Active exploration Algorithm 2 Flexible Exploration (FLEX)
Open Source Code Yes Our code and a demonstration video are available at https://github.com/MB-29/exploration.
Open Datasets Yes We experiment on the pendulum and the cartpole of the Deep Mind control suite (Tunyasuvunakool et al., 2020)
Dataset Splits No The paper discusses training models and evaluating them but does not provide specific train/validation/test split percentages, sample counts, or detailed splitting methodology.
Hardware Specification No The paper mentions measuring computational time 'on a laptop' but does not provide specific details about the hardware components (e.g., CPU, GPU models, memory).
Software Dependencies No The paper mentions using 'Pytorch (Paszke et al., 2017)' and the 'Adam optimizer (Kingma & Ba, 2015)', as well as the 'mpc package and the i LQR algorithm for exploitation (Amos et al., 2018)', but it does not specify version numbers for these software components.
Experiment Setup Yes The learning models include various degrees of prior knowledge on the dynamics. A linear model is used for the pendulum, and neural networks are used for the other environments. The Jacobians of Proposition 2 are computed using automatic differentiation. At each time step, the model is evaluated with (2.3) computed over a fixed grid. neural networks... of width 8 with one hidden layer and tanh nonlinearity trained using ADAM optimizer... with a batch size of 100. specific learning rates and cost functions are provided for different environments.