Optimistic Active Exploration of Dynamical Systems

Authors: Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes, Stelian Coros, Andreas Krause

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we compare OPAX with other heuristic active exploration approaches on several environments. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks. We evaluate OPAX on several simulated robotic tasks with state dimensions ranging from two to 58. The empirical results provide validation for our theoretical conclusions, showing that OPAX consistently delivers strong performance across all tested environments.
Researcher Affiliation Academia ETH Zürich 1 MPI for Intelligent Systems2 {sukhijab,trevenl,scoros,krausea}@ethz.ch {cansu.sancaktar,sebastian.blae}@tuebingen.mpg.de
Pseudocode Yes OPAX: OPTIMISTIC ACTIVE EXPLORATION Init: Aleatoric uncertainty σ, Probability δ, Statistical model (µ0, σ0, β0(δ)) for episode n = 1, . . . , N do πn = argmax π Π max η Ξ E 1 + σ2 n 1,j(xt, π(xt)) Prepare policy Dn ROLLOUT(πn) Collect measurements Update (µn, σn, βn(δ)) D1:n Update model
Open Source Code Yes Finally, we provide an efficient implementation1 of OPAX in JAX (Bradbury et al., 2018). 1https://github.com/lasgroup/opax
Open Datasets Yes We evaluate OPAX on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), on the Reacher, Swimmer, and Cheetah from the deep mind control suite (Tassa et al., 2018), and a high-dimensional simulated robotic manipulation task introduced by Li et al. (2020).
Dataset Splits No No explicit mention of specific train/validation/test dataset splits with percentages, counts, or predefined citations. The paper describes an episodic setting where data is collected and used to update a model.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) were explicitly mentioned for running the experiments or training the models.
Software Dependencies No Finally, we provide an efficient implementation1 of OPAX in JAX (Bradbury et al., 2018). However, no specific version number for JAX or other libraries like Python, PyTorch, TensorFlow, etc., is provided.
Experiment Setup Yes Table 3: Hyperparameters for results in Section 5. Table 4: Parameters of i CEM optimizer for experiments in Section 5. Table 5: Parameters of model-based SAC optimizer for experiments in Section 5. Table 6: Environment and model settings used for the experiment results shown in Figure 4. Table 7: Base settings for i CEM as they are used in the intrinsic phase. Same settings are used for all methods. Table 8: i CEM hyperparameters used for zero-shot generalization in the extrinsic phase. Any settings not specified here are the same as the general settings given in Table 7.