Generative Exploration and Exploitation
Authors: Jiechuan Jiang, Zongqing Lu4337-4344
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing. |
| Researcher Affiliation | Academia | Jiechuan Jiang Peking University jiechuan.jiang@pku.edu.cn Zongqing Lu Peking University zongqing.lu@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 details the training of GENE. |
| Open Source Code | No | The paper provides a link for task details and hyperparameters, but it does not provide an explicit statement or link to the source code for the described methodology. |
| Open Datasets | No | The paper mentions common RL environments (Maze, Maze Ant, Cooperative Navigation) which generate data through interaction, but it does not provide specific access information (link, DOI, repository, or formal citation with authors/year) for a publicly available or open dataset used in the experiments. |
| Dataset Splits | No | The paper does not provide specific details regarding train, validation, or test dataset splits needed for reproducibility. While it mentions training a VAE, it doesn't specify data splits for the main RL experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions base RL algorithms (PPO, TRPO, MADDPG) and VAE but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Every episode, the agent starts from the generated states S with a probability p, otherwise from the initial state. The probability p could be seen as how much to change the start state distribution. ... Every T episodes, we train the VAE from the scratch using the states stored in B0 and B1. ... All the experimental results are presented using mean and standard deviation of five runs. |