Adversarial Environment Design via Regret-Guided Diffusion Models

Authors: Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments.
Researcher Affiliation Academia Hojun Chung1, Junseo Lee1, Minsoo Kim1, Dohyeong Kim2, and Songhwai Oh1,2, 1Interdisciplinary Program in Artificial Intelligence and ASRI, Seoul National University 2Department of Electrical and Computer Engineering and ASRI, Seoul National University {hojun.chung, junseo.lee, minsoo.kim, dohyeong.kim}@rllab.snu.ac.kr, songhwai@snu.ac.kr
Pseudocode Yes Algorithm 1 Adversarial Environment Design via Regret-Guided Diffusion Models
Open Source Code Yes Project page: https://rllab-snu.github.io/projects/ADD
Open Datasets Yes We first evaluate the proposed method on a maze navigation task [14], which is based on the Minigrid [53].
Dataset Splits No The paper describes training and testing on unseen environments for generalization evaluation, but it does not explicitly mention using a separate 'validation set' or a 'validation split' during the model's training or development process.
Hardware Specification Yes All methods are trained utilizing RTX 3090Ti.
Software Dependencies No The paper mentions several software components and frameworks like PPO, Minigrid, and Open AI Gym, and references external implementations, but it does not provide specific version numbers for these or other key software dependencies (e.g., Python version, library versions like PyTorch 1.9).
Experiment Setup Yes We report detailed hyperparameters used to train diffusion models in Table 3. Lastly, we report detailed hyperparameters for regret-guided diffusion process and training the environment critic in Table 4. The same parameters and network architecture were used to train the ADD agent, and the hyperparameters used are reported in the Table 2.