Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control
Authors: Glen Berseth, Cheng Xie, Paul Cernek, Michiel Van de Panne
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our progressive learning and integration via distillation (PLAID) method against three alternative baselines. |
| Researcher Affiliation | Academia | University of British Colubia |
| Pseudocode | No | The paper describes methods and frameworks but does not contain pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide a link or an explicit statement about the availability of its source code. |
| Open Datasets | No | The paper describes a simulated environment with a '2D humanoid walker (pdbiped)' and 'randomly generated' terrain types, but it does not use or provide access to a public or open dataset with a concrete link, DOI, or formal citation. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) to reproduce the data partitioning. |
| Hardware Specification | No | The paper states 'Each training simulation takes approximately 5 hours across 8 threads', but it does not specify any particular CPU model, GPU model, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions 'Stochastic Gradient Decent (SGD) with momentum' and 'Python', but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For all of our experiments we linearly anneal ϵ from 0.2 to 0.1 in 100, 000 iterations and leave it from that point on. Each training simulation takes approximately 5 hours across 8 threads. For network training we use Stochastic Gradient Decent (SGD) with momentum. During the distillation step we use gradually anneal the probability of selecting an expert action from 1 to 0 over 10, 000 iterations. |