Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control
Authors: Glen Berseth, Cheng Xie, Paul Cernek, Michiel Van de Panne
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our progressive learning and integration via distillation (PLAID) method against three alternative baselines. |
| Researcher Affiliation | Academia | University of British Colubia |
| Pseudocode | No | The paper describes methods and frameworks but does not contain pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide a link or an explicit statement about the availability of its source code. |
| Open Datasets | No | The paper describes a simulated environment with a '2D humanoid walker (pdbiped)' and 'randomly generated' terrain types, but it does not use or provide access to a public or open dataset with a concrete link, DOI, or formal citation. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) to reproduce the data partitioning. |
| Hardware Specification | No | The paper states 'Each training simulation takes approximately 5 hours across 8 threads', but it does not specify any particular CPU model, GPU model, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions 'Stochastic Gradient Decent (SGD) with momentum' and 'Python', but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For all of our experiments we linearly anneal ϵ from 0.2 to 0.1 in 100, 000 iterations and leave it from that point on. Each training simulation takes approximately 5 hours across 8 threads. For network training we use Stochastic Gradient Decent (SGD) with momentum. During the distillation step we use gradually anneal the probability of selecting an expert action from 1 to 0 over 10, 000 iterations. |