Adaptive Procedural Task Generation for Hard-Exploration Problems

Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, L. Fei-Fei

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations.
Researcher Affiliation Collaboration Kuan Fang Stanford University kuanfang@stanford.edu Yuke Zhu UT Austin & Nvidia yukez@cs.utexas.edu Silvio Savarese Stanford University ssilvio@stanford.edu Li Fei-Fei Stanford University feifeili@stanford.edu
Pseudocode Yes Algorithm 1 Adaptive Procedural Task Generation (APT-Gen)
Open Source Code Yes 1Project page: https://kuanfang.github.io/apt-gen/
Open Datasets Yes The Grid-World domain is based on the popular benchmark for RL research (Chevalier-Boisvert et al., 2018).
Dataset Splits No The paper describes continuous data collection from environments and evaluation, but it does not specify traditional training/validation/test dataset splits (e.g., percentages or sample counts) for a fixed dataset.
Hardware Specification Yes During each run, the method is trained on a single NVIDIA GeForce GTX1080 Ti GPU and 8 CPU cores with 32 GB memory.
Software Dependencies No The paper mentions software like TensorFlow and a physics engine, but it does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup Yes For all experiments, we use the ADAM optimizer (Kingma & Ba, 2014) with learning rate of 3 10 4, β1 = 0.9, β2 = 0.999 and the batch size of 128. Totally 10,000 environment steps are collected to initialize the replay buffers. ... Specifically, we use δ = 0.5 with a tolerance of 0.1. If E[P t γtrt] < 0.4, β min(β 2, 8); if E[P t γtrt] > 0.6, β max(β/2, 1/8).