reproducibilityindex.ai

Unsupervised Training Sequence Design: Efficient and Generalizable Agent Training

Authors: Wenjun Li, Pradeep Varakantham

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically validate the effectiveness of the UTSD framework and demonstrate the transferability of the meta-teacher by comparing it to a set of leading baselines in UED: Domain Randomization (DR), PAIRED, PLR , and ACCEL. We conduct experiments on three popular yet distinct benchmarks in UED: Bit-Flipping, Lunar-Lander, and Minigrid.
Researcher Affiliation	Academia	Wenjun Li, Pradeep Varakantham Singapore Management University wjli.2020@phdcs.smu.edu.sg, pradeepv@smu.edu.sg
Pseudocode	Yes	Algorithm 1: Train meta-teacher
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	The Bit-Flipping environment, introduced by (Andrychowicz et al. 2017), is widely used in RL for its efficiency.
Dataset Splits	Yes	Specifically, our approach makes two key contributions: 1. A scalable agent policy encoding method, which can help the teacher in UTSD closely track the student s overall ability and behaviors and consequently design efficient training sequences with finite length. 2. Train a generalizable teacher that can rapidly adapt to unseen students with various learning patterns and capabilities by employing the context-based meta-RL approach. Student Policy Encoding In this section, we elaborate on how to collect a set of diverse environments regarding the student agent policy behaviors with the Quality Diversity method.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions various algorithms like DQN (Mnih et al. 2013), PPO (Schulman et al. 2017), SAC (Haarnoja et al. 2018), and PEARL (Rakelly et al. 2019) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	In our experiments, the maximum training sequence length is set to 12 and the training amount on each task is fixed at 5k steps.