Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Procedural Task Generation for Hard-Exploration Problems
Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, L. Fei-Fei
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations. |
| Researcher Affiliation | Collaboration | Kuan Fang Stanford University EMAIL Yuke Zhu UT Austin & Nvidia EMAIL Silvio Savarese Stanford University EMAIL Li Fei-Fei Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptive Procedural Task Generation (APT-Gen) |
| Open Source Code | Yes | 1Project page: https://kuanfang.github.io/apt-gen/ |
| Open Datasets | Yes | The Grid-World domain is based on the popular benchmark for RL research (Chevalier-Boisvert et al., 2018). |
| Dataset Splits | No | The paper describes continuous data collection from environments and evaluation, but it does not specify traditional training/validation/test dataset splits (e.g., percentages or sample counts) for a fixed dataset. |
| Hardware Specification | Yes | During each run, the method is trained on a single NVIDIA GeForce GTX1080 Ti GPU and 8 CPU cores with 32 GB memory. |
| Software Dependencies | No | The paper mentions software like TensorFlow and a physics engine, but it does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | For all experiments, we use the ADAM optimizer (Kingma & Ba, 2014) with learning rate of 3 10 4, β1 = 0.9, β2 = 0.999 and the batch size of 128. Totally 10,000 environment steps are collected to initialize the replay buffers. ... Specifically, we use δ = 0.5 with a tolerance of 0.1. If E[P t γtrt] < 0.4, β min(β 2, 8); if E[P t γtrt] > 0.6, β max(β/2, 1/8). |