Data augmentation for efficient learning from parametric experts
Authors: Alexandre Galashov, Josh S. Merel, Nicolas Heess
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the benefit of our method in the context of several existing and widely used algorithms that include policy cloning as a constituent part. Moreover, we highlight the benefits of our approach in two practically relevant settings (a) expert compression, i.e. transfer to a student with fewer parameters; and (b) transfer from privileged experts, i.e. where the expert has a different observation space than the student, usually including access to privileged information. To study how our method performs on complex control domains, we consider three complex, high Do F continuous control tasks: Humanoid Run, Humanoid Walls and Insert Peg. |
| Researcher Affiliation | Industry | Alexandre Galashov Deep Mind agalashov@deepmind.com Josh Merel Deep Mind jsmerel@gmail.com Nicolas Heess Deep Mind heess@deepmind.com |
| Pseudocode | Yes | We illustrate it in Figure 1 and we formulate APC algorithm for BC in Algorithm 1. Algorithm 1 Augmented Policy Cloning (APC) |
| Open Source Code | No | Did you include the license to the code and datasets? [No] The code and the data are proprietary. |
| Open Datasets | Yes | To study how our method performs on complex control domains, we consider three complex, high Do F continuous control tasks: Humanoid Run, Humanoid Walls and Insert Peg. All these domains are implemented using the Mu Jo Co physics engine [Todorov et al., 2012] and are available in the dm_control repository [Tunyasuvunakool et al., 2020]. |
| Dataset Splits | Yes | We apply early stopping and select hyperparameters based on the evaluation performance on a validation set. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [TODO] |
| Software Dependencies | No | The paper mentions software like the Mu Jo Co physics engine and algorithms like MPO and VMPO, but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | In the subsequent BC experiments, we use σE = 0.2. Moreover, in order to analyze the noise robustness of the student policy is trained via BC, π( |s) = N(µ(s), σ(s)), we evaluate it by executing the action drawn from a Gaussian with a fixed variance, i.e. a N(µ(s), σ), where σ is the fixed amount of student noise. In all the experiments we use σ = 0.2. |