Optimistic Active Exploration of Dynamical Systems
Authors: Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes, Stelian Coros, Andreas Krause
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we compare OPAX with other heuristic active exploration approaches on several environments. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks. We evaluate OPAX on several simulated robotic tasks with state dimensions ranging from two to 58. The empirical results provide validation for our theoretical conclusions, showing that OPAX consistently delivers strong performance across all tested environments. |
| Researcher Affiliation | Academia | ETH Zürich 1 MPI for Intelligent Systems2 {sukhijab,trevenl,scoros,krausea}@ethz.ch {cansu.sancaktar,sebastian.blae}@tuebingen.mpg.de |
| Pseudocode | Yes | OPAX: OPTIMISTIC ACTIVE EXPLORATION Init: Aleatoric uncertainty σ, Probability δ, Statistical model (µ0, σ0, β0(δ)) for episode n = 1, . . . , N do πn = argmax π Π max η Ξ E 1 + σ2 n 1,j(xt, π(xt)) Prepare policy Dn ROLLOUT(πn) Collect measurements Update (µn, σn, βn(δ)) D1:n Update model |
| Open Source Code | Yes | Finally, we provide an efficient implementation1 of OPAX in JAX (Bradbury et al., 2018). 1https://github.com/lasgroup/opax |
| Open Datasets | Yes | We evaluate OPAX on the Pendulum-v1 and Mountain Car environment from the Open AI gym benchmark suite (Brockman et al., 2016), on the Reacher, Swimmer, and Cheetah from the deep mind control suite (Tassa et al., 2018), and a high-dimensional simulated robotic manipulation task introduced by Li et al. (2020). |
| Dataset Splits | No | No explicit mention of specific train/validation/test dataset splits with percentages, counts, or predefined citations. The paper describes an episodic setting where data is collected and used to update a model. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) were explicitly mentioned for running the experiments or training the models. |
| Software Dependencies | No | Finally, we provide an efficient implementation1 of OPAX in JAX (Bradbury et al., 2018). However, no specific version number for JAX or other libraries like Python, PyTorch, TensorFlow, etc., is provided. |
| Experiment Setup | Yes | Table 3: Hyperparameters for results in Section 5. Table 4: Parameters of i CEM optimizer for experiments in Section 5. Table 5: Parameters of model-based SAC optimizer for experiments in Section 5. Table 6: Environment and model settings used for the experiment results shown in Figure 4. Table 7: Base settings for i CEM as they are used in the intrinsic phase. Same settings are used for all methods. Table 8: i CEM hyperparameters used for zero-shot generalization in the extrinsic phase. Any settings not specified here are the same as the general settings given in Table 7. |