Generalization through Diversity: Improving Unsupervised Environment Design

Authors: Wenjun Li, Pradeep Varakantham, Dexun Li

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.
Researcher Affiliation Academia Wenjun Li , Pradeep Varakantham , Dexun Li Singapore Management University {wjli.2020, pradeepv, dexunli.2019}@smu.edu.sg
Pseudocode Yes The complete procedure of DIPLR is presented in Algorithm 1.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the methodology or a link to a code repository.
Open Datasets Yes We conduct experiments and empirically demonstrate the effectiveness and generality of DIPLR on three popular yet highly distinct UPOMDP domains, Minigrid, Bipedal-Walker and Car-Racing.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard split citations with authors/year) for reproducibility.
Hardware Specification No The paper does not specify any particular hardware components (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions using Proximal Policy Optimization (PPO) and a Wasserstein distance solver from [Flamary et al., 2021], but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We train all the student agents for 30k PPO updates ( 250M steps)... We could assign different weights to diversity and regret by letting the replay probability Preplay = ρ PD + (1 ρ) PR, where PD and PR are the prioritization of diversity and regret respectively, and ρ is the tuning parameter.