reproducibilityindex.ai

Tight Performance Guarantees of Imitator Policies with Continuous Actions

Authors: Davide Maran, Alberto Maria Metelli, Marcello Restelli

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 3, we show the results of testing this statement on some of the most common continuous-actions environments of the Open AI gym (Brockman et al. 2016) library. In this simulation, we first train an expert policy with DDPG (Lillicrap et al. 2015), TD3 (Fujimoto, Hoof, and Meger 2018) and PPO (Schulman et al. 2017) in the following Open AI gym environments: Pendulum-v0, Lunar Lander Continuous-v2, Bipedal Walker-v3. Then, we evaluated the performance of these experts with noise injection with Gaussian noise with different standard deviations. Figure 3: The performance of the expert Jπ as a function of the standard deviation of the noise σ. The performance is measured on 40 episodes int environment repeated for 20 different random seeds (nuance represents the 95% non-parametric c.i.).
Researcher Affiliation	Academia	Politecnico di Milano Piazza Leonardo da Vinci, 32 20133, Milan, Italy {davide.maran, albertomaria.metelli, macrello.restelli}@polimi.it
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	some of the most common continuous-actions environments of the Open AI gym (Brockman et al. 2016) library. ... Pendulum-v0: ... Lunar Lander Continuous-v2: ... Bipedal Walker-v3:
Dataset Splits	No	The paper describes experimental runs (e.g., '40 episodes int environment repeated for 20 different random seeds') but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software like DDPG, TD3, PPO, and Open AI gym, but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The performance is measured on 40 episodes int environment repeated for 20 different random seeds (nuance represents the 95% non-parametric c.i.). Then, we evaluated the performance of these experts with noise injection with Gaussian noise with different standard deviations. Details can be found in Appendix.