Tight Performance Guarantees of Imitator Policies with Continuous Actions

Authors: Davide Maran, Alberto Maria Metelli, Marcello Restelli

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 3, we show the results of testing this statement on some of the most common continuous-actions environments of the Open AI gym (Brockman et al. 2016) library. In this simulation, we first train an expert policy with DDPG (Lillicrap et al. 2015), TD3 (Fujimoto, Hoof, and Meger 2018) and PPO (Schulman et al. 2017) in the following Open AI gym environments: Pendulum-v0, Lunar Lander Continuous-v2, Bipedal Walker-v3. Then, we evaluated the performance of these experts with noise injection with Gaussian noise with different standard deviations. Figure 3: The performance of the expert Jπ as a function of the standard deviation of the noise σ. The performance is measured on 40 episodes int environment repeated for 20 different random seeds (nuance represents the 95% non-parametric c.i.).
Researcher Affiliation Academia Politecnico di Milano Piazza Leonardo da Vinci, 32 20133, Milan, Italy {davide.maran, albertomaria.metelli, macrello.restelli}@polimi.it
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes some of the most common continuous-actions environments of the Open AI gym (Brockman et al. 2016) library. ... Pendulum-v0: ... Lunar Lander Continuous-v2: ... Bipedal Walker-v3:
Dataset Splits No The paper describes experimental runs (e.g., '40 episodes int environment repeated for 20 different random seeds') but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like DDPG, TD3, PPO, and Open AI gym, but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The performance is measured on 40 episodes int environment repeated for 20 different random seeds (nuance represents the 95% non-parametric c.i.). Then, we evaluated the performance of these experts with noise injection with Gaussian noise with different standard deviations. Details can be found in Appendix.