Delayed Reinforcement Learning by Imitation
Authors: Pierre Liotet, Davide Maran, Lorenzo Bisi, Marcello Restelli
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading. |
| Researcher Affiliation | Academia | 1Politecnico di Milano, Milan, Italy. |
| Pseudocode | Yes | Algorithm 1 Delayed Imitation with DAGGER (DIDA) |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code. |
| Open Datasets | Yes | We use the version from the library gym (Brockman et al., 2016). Mujoco Continuous robotic locomotion control tasks realized with an advanced physics simulator from the library mujoco (Todorov et al., 2012). |
| Dataset Splits | Yes | Finally, the expert has been selected by performing validation of its hyper-parameters on 2018, it is therefore possible to do validation on the delayed dataset of 2018 in order to select an expert which, albeit trained on undelayed data, performs well on delayed data. We refer to this expert as delayed expert. ... The second iteration of DIDA has been selected by validation. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper mentions software like "gym", "mujoco", "XGBoost", "Extra Trees", "Adam", "Re LU" but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | More details and all hyper-parameters are reported in Appendix E.2. (Tables 1-7 in Appendix E.2 provide detailed hyper-parameters for DIDA and all baselines, including learning rates, batch sizes, epochs, etc.) |