reproducibilityindex.ai

Adversarial Environment Design via Regret-Guided Diffusion Models

Authors: Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments.
Researcher Affiliation	Academia	Hojun Chung1, Junseo Lee1, Minsoo Kim1, Dohyeong Kim2, and Songhwai Oh1,2, 1Interdisciplinary Program in Artificial Intelligence and ASRI, Seoul National University 2Department of Electrical and Computer Engineering and ASRI, Seoul National University {hojun.chung, junseo.lee, minsoo.kim, dohyeong.kim}@rllab.snu.ac.kr, songhwai@snu.ac.kr
Pseudocode	Yes	Algorithm 1 Adversarial Environment Design via Regret-Guided Diffusion Models
Open Source Code	Yes	Project page: https://rllab-snu.github.io/projects/ADD
Open Datasets	Yes	We first evaluate the proposed method on a maze navigation task [14], which is based on the Minigrid [53].
Dataset Splits	No	The paper describes training and testing on unseen environments for generalization evaluation, but it does not explicitly mention using a separate 'validation set' or a 'validation split' during the model's training or development process.
Hardware Specification	Yes	All methods are trained utilizing RTX 3090Ti.
Software Dependencies	No	The paper mentions several software components and frameworks like PPO, Minigrid, and Open AI Gym, and references external implementations, but it does not provide specific version numbers for these or other key software dependencies (e.g., Python version, library versions like PyTorch 1.9).
Experiment Setup	Yes	We report detailed hyperparameters used to train diffusion models in Table 3. Lastly, we report detailed hyperparameters for regret-guided diffusion process and training the environment critic in Table 4. The same parameters and network architecture were used to train the ADD agent, and the hyperparameters used are reported in the Table 2.