Information-Theoretic Safe Exploration with Gaussian Processes
Authors: Alessandro Bottero, Carlos Luis, Julia Vinogradska, Felix Berkenkamp, Jan R. Peters
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations demonstrate improved data-efficiency and scalability. ... In this section we empirically evaluate ISE. |
| Researcher Affiliation | Collaboration | 1Bosch Center for Artificial Intelligence, Germany 2Technische Universität Darmstadt, Germany |
| Pseudocode | Yes | Algorithm 1 Information-Theoretic Safe Exploration |
| Open Source Code | Yes | The code is available at https://github.com/boschresearch/ information-theoretic-safe-exploration. |
| Open Datasets | No | The paper uses either generated GP samples or interaction data from the Open AI Gym framework. It cites the Open AI Gym framework but does not provide a direct link or citation to a specific publicly available dataset used for training/evaluation. |
| Dataset Splits | No | The paper does not specify traditional train/validation/test dataset splits (e.g., percentages or counts). Experiments involve iterative data collection and evaluation within an exploration process rather than predefined splits. |
| Hardware Specification | Yes | All experiments were performed on either a desktop PC with an Intel i7-2600 CPU and a single NVIDIA RTX 2080 Ti GPU, or on an internal cluster running on machines with Intel Xeon E5-2630 CPUs. |
| Software Dependencies | No | The paper mentions using PyTorch and GPyTorch for implementation but does not specify their version numbers. |
| Experiment Setup | Yes | As commonly done in the literature (see Section 5), we set βn = 2 for all experiments. ... We select 100 samples from a two-dimensional GP with RBF kernel, defined in [ 2.5, 2.5] [ 2.5, 2.5] and run ISE and STAGEOPT for 100 iterations for each sample. ... For the inverted pendulum task, we used an episode length of 200 time steps and a threshold θM = 1.5 rad/s. For the cart pole task, we used an episode length of 200 time steps and a threshold θM = 0.2 rad. |