Delayed Reinforcement Learning by Imitation

Authors: Pierre Liotet, Davide Maran, Lorenzo Bisi, Marcello Restelli

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading.
Researcher Affiliation Academia 1Politecnico di Milano, Milan, Italy.
Pseudocode Yes Algorithm 1 Delayed Imitation with DAGGER (DIDA)
Open Source Code No The paper does not provide an explicit statement or link to its open-source code.
Open Datasets Yes We use the version from the library gym (Brockman et al., 2016). Mujoco Continuous robotic locomotion control tasks realized with an advanced physics simulator from the library mujoco (Todorov et al., 2012).
Dataset Splits Yes Finally, the expert has been selected by performing validation of its hyper-parameters on 2018, it is therefore possible to do validation on the delayed dataset of 2018 in order to select an expert which, albeit trained on undelayed data, performs well on delayed data. We refer to this expert as delayed expert. ... The second iteration of DIDA has been selected by validation.
Hardware Specification No The paper does not provide specific hardware details used for running the experiments.
Software Dependencies No The paper mentions software like "gym", "mujoco", "XGBoost", "Extra Trees", "Adam", "Re LU" but does not provide specific version numbers for these components.
Experiment Setup Yes More details and all hyper-parameters are reported in Appendix E.2. (Tables 1-7 in Appendix E.2 provide detailed hyper-parameters for DIDA and all baselines, including learning rates, batch sizes, epochs, etc.)