Effective Exploration Based on the Structural Information Principles

Authors: Xianghua Zeng, Hao Peng, Angsheng Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations in the Mini Grid, Meta World, and Deep Mind Control Suite benchmarks demonstrate that SI2E significantly outperforms state-of-the-art exploration baselines regarding final performance and sample efficiency, with maximum improvements of 37.63% and 60.25%, respectively.
Researcher Affiliation Academia 1 State Key Laboratory of Software Development Environment, Beihang University, Beijing, China 2 Zhongguancun Laboratory, Beijing, China
Pseudocode Yes Algorithm 2 Effective Exploration based on Structural Information Principles
Open Source Code Yes For further research, the source code is available at 1. [footnote: https://github.com/SELGroup/SI2E]
Open Datasets Yes In this section, we present a comprehensive suite of comparative experiments on Mini Grid [Chevalier Boisvert et al., 2018], Meta World [Yu et al., 2020], and the Deep Mind Control Suite (DMControl) [Tunyasuvunakool et al., 2020] to evaluate the effectiveness of SI2E in terms of both final performance and sample efficiency.
Dataset Splits No The paper discusses environmental steps and evaluation, but does not specify explicit training/validation/test splits with percentages or counts for dataset partitioning in a traditional supervised learning context.
Hardware Specification Yes In the experiments conducted for this work, we utilize a single NVIDIA RTX A1000 GPU and eight Intel Core i9 CPU cores clocked at 3.00GHz for each training run.
Software Dependencies No The paper mentions software components like A2C and Dr Qv2 algorithms and refers to their implementations, but it does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch, CUDA) within the paper's content describing its own experimental setup.
Experiment Setup Yes The comprehensive hyperparameters for the A2C algorithm are detailed in Table 4. Specific hyperparameters in the Dr Qv2 are summarized in Table 5. Across all exploration methods, we maintain fixed scale parameters β = 0.005 and k = 5, in line with the original framework.