reproducibilityindex.ai

Delayed Reinforcement Learning by Imitation

Authors: Pierre Liotet, Davide Maran, Lorenzo Bisi, Marcello Restelli

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading.
Researcher Affiliation	Academia	1Politecnico di Milano, Milan, Italy.
Pseudocode	Yes	Algorithm 1 Delayed Imitation with DAGGER (DIDA)
Open Source Code	No	The paper does not provide an explicit statement or link to its open-source code.
Open Datasets	Yes	We use the version from the library gym (Brockman et al., 2016). Mujoco Continuous robotic locomotion control tasks realized with an advanced physics simulator from the library mujoco (Todorov et al., 2012).
Dataset Splits	Yes	Finally, the expert has been selected by performing validation of its hyper-parameters on 2018, it is therefore possible to do validation on the delayed dataset of 2018 in order to select an expert which, albeit trained on undelayed data, performs well on delayed data. We refer to this expert as delayed expert. ... The second iteration of DIDA has been selected by validation.
Hardware Specification	No	The paper does not provide specific hardware details used for running the experiments.
Software Dependencies	No	The paper mentions software like "gym", "mujoco", "XGBoost", "Extra Trees", "Adam", "Re LU" but does not provide specific version numbers for these components.
Experiment Setup	Yes	More details and all hyper-parameters are reported in Appendix E.2. (Tables 1-7 in Appendix E.2 provide detailed hyper-parameters for DIDA and all baselines, including learning rates, batch sizes, epochs, etc.)