Generative Exploration and Exploitation

Authors: Jiechuan Jiang, Zongqing Lu4337-4344

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing.
Researcher Affiliation Academia Jiechuan Jiang Peking University jiechuan.jiang@pku.edu.cn Zongqing Lu Peking University zongqing.lu@pku.edu.cn
Pseudocode Yes Algorithm 1 details the training of GENE.
Open Source Code No The paper provides a link for task details and hyperparameters, but it does not provide an explicit statement or link to the source code for the described methodology.
Open Datasets No The paper mentions common RL environments (Maze, Maze Ant, Cooperative Navigation) which generate data through interaction, but it does not provide specific access information (link, DOI, repository, or formal citation with authors/year) for a publicly available or open dataset used in the experiments.
Dataset Splits No The paper does not provide specific details regarding train, validation, or test dataset splits needed for reproducibility. While it mentions training a VAE, it doesn't specify data splits for the main RL experiments.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions base RL algorithms (PPO, TRPO, MADDPG) and VAE but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Every episode, the agent starts from the generated states S with a probability p, otherwise from the initial state. The probability p could be seen as how much to change the start state distribution. ... Every T episodes, we train the VAE from the scratch using the states stored in B0 and B1. ... All the experimental results are presented using mean and standard deviation of five runs.