MADE: Exploration via Maximizing Deviation from Explored Regions

Authors: Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph E. Gonzalez, Stuart Russell

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from Mini Grid and Deep Mind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods.
Researcher Affiliation Collaboration University of California, Berkeley {tianjunz,paria.rashidinejad,jiantao,russell}@berkeley.edu 2Facebook AI Research yuandong@fb.com
Pseudocode Yes Algorithm 1 Policy computation for adaptively regularized objective
Open Source Code Yes Our code is available at https://github.com/tianjunz/MADE.
Open Datasets Yes When tested in the procedurally-generated Mini Grid environments [19], MADE manages to converge... In Deep Mind Control Suite [95], we build upon...
Dataset Splits No The paper mentions training details are in the supplemental material, but the main text does not specify exact train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper explicitly states 'No' when asked in the checklist 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)?' and no specific hardware details are provided in the main text.
Software Dependencies No The paper mentions several software components and algorithms used (e.g., IMPALA, RAD, Dreamer, RND, ICM) but does not provide specific version numbers for these, nor for any programming languages or libraries.
Experiment Setup Yes Details on experiments and hyperparameters are provided in Appendix B.