Adversarial Environment Design via Regret-Guided Diffusion Models
Authors: Hojun Chung, Junseo Lee, Minsoo Kim, Dohyeong Kim, Songhwai Oh
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. |
| Researcher Affiliation | Academia | Hojun Chung1, Junseo Lee1, Minsoo Kim1, Dohyeong Kim2, and Songhwai Oh1,2, 1Interdisciplinary Program in Artificial Intelligence and ASRI, Seoul National University 2Department of Electrical and Computer Engineering and ASRI, Seoul National University {hojun.chung, junseo.lee, minsoo.kim, dohyeong.kim}@rllab.snu.ac.kr, songhwai@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Adversarial Environment Design via Regret-Guided Diffusion Models |
| Open Source Code | Yes | Project page: https://rllab-snu.github.io/projects/ADD |
| Open Datasets | Yes | We first evaluate the proposed method on a maze navigation task [14], which is based on the Minigrid [53]. |
| Dataset Splits | No | The paper describes training and testing on unseen environments for generalization evaluation, but it does not explicitly mention using a separate 'validation set' or a 'validation split' during the model's training or development process. |
| Hardware Specification | Yes | All methods are trained utilizing RTX 3090Ti. |
| Software Dependencies | No | The paper mentions several software components and frameworks like PPO, Minigrid, and Open AI Gym, and references external implementations, but it does not provide specific version numbers for these or other key software dependencies (e.g., Python version, library versions like PyTorch 1.9). |
| Experiment Setup | Yes | We report detailed hyperparameters used to train diffusion models in Table 3. Lastly, we report detailed hyperparameters for regret-guided diffusion process and training the environment critic in Table 4. The same parameters and network architecture were used to train the ADD agent, and the hyperparameters used are reported in the Table 2. |