Adaptive Procedural Task Generation for Hard-Exploration Problems
Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, L. Fei-Fei
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations. |
| Researcher Affiliation | Collaboration | Kuan Fang Stanford University kuanfang@stanford.edu Yuke Zhu UT Austin & Nvidia yukez@cs.utexas.edu Silvio Savarese Stanford University ssilvio@stanford.edu Li Fei-Fei Stanford University feifeili@stanford.edu |
| Pseudocode | Yes | Algorithm 1 Adaptive Procedural Task Generation (APT-Gen) |
| Open Source Code | Yes | 1Project page: https://kuanfang.github.io/apt-gen/ |
| Open Datasets | Yes | The Grid-World domain is based on the popular benchmark for RL research (Chevalier-Boisvert et al., 2018). |
| Dataset Splits | No | The paper describes continuous data collection from environments and evaluation, but it does not specify traditional training/validation/test dataset splits (e.g., percentages or sample counts) for a fixed dataset. |
| Hardware Specification | Yes | During each run, the method is trained on a single NVIDIA GeForce GTX1080 Ti GPU and 8 CPU cores with 32 GB memory. |
| Software Dependencies | No | The paper mentions software like TensorFlow and a physics engine, but it does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | For all experiments, we use the ADAM optimizer (Kingma & Ba, 2014) with learning rate of 3 10 4, β1 = 0.9, β2 = 0.999 and the batch size of 128. Totally 10,000 environment steps are collected to initialize the replay buffers. ... Specifically, we use δ = 0.5 with a tolerance of 0.1. If E[P t γtrt] < 0.4, β min(β 2, 8); if E[P t γtrt] > 0.6, β max(β/2, 1/8). |