Generalization through Diversity: Improving Unsupervised Environment Design
Authors: Wenjun Li, Pradeep Varakantham, Dexun Li
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature. |
| Researcher Affiliation | Academia | Wenjun Li , Pradeep Varakantham , Dexun Li Singapore Management University {wjli.2020, pradeepv, dexunli.2019}@smu.edu.sg |
| Pseudocode | Yes | The complete procedure of DIPLR is presented in Algorithm 1. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments and empirically demonstrate the effectiveness and generality of DIPLR on three popular yet highly distinct UPOMDP domains, Minigrid, Bipedal-Walker and Car-Racing. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard split citations with authors/year) for reproducibility. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Proximal Policy Optimization (PPO) and a Wasserstein distance solver from [Flamary et al., 2021], but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train all the student agents for 30k PPO updates ( 250M steps)... We could assign different weights to diversity and regret by letting the replay probability Preplay = ρ PD + (1 ρ) PR, where PD and PR are the prioritization of diversity and regret respectively, and ρ is the tuning parameter. |