Tight Performance Guarantees of Imitator Policies with Continuous Actions
Authors: Davide Maran, Alberto Maria Metelli, Marcello Restelli
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 3, we show the results of testing this statement on some of the most common continuous-actions environments of the Open AI gym (Brockman et al. 2016) library. In this simulation, we first train an expert policy with DDPG (Lillicrap et al. 2015), TD3 (Fujimoto, Hoof, and Meger 2018) and PPO (Schulman et al. 2017) in the following Open AI gym environments: Pendulum-v0, Lunar Lander Continuous-v2, Bipedal Walker-v3. Then, we evaluated the performance of these experts with noise injection with Gaussian noise with different standard deviations. Figure 3: The performance of the expert Jπ as a function of the standard deviation of the noise σ. The performance is measured on 40 episodes int environment repeated for 20 different random seeds (nuance represents the 95% non-parametric c.i.). |
| Researcher Affiliation | Academia | Politecnico di Milano Piazza Leonardo da Vinci, 32 20133, Milan, Italy {davide.maran, albertomaria.metelli, macrello.restelli}@polimi.it |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | some of the most common continuous-actions environments of the Open AI gym (Brockman et al. 2016) library. ... Pendulum-v0: ... Lunar Lander Continuous-v2: ... Bipedal Walker-v3: |
| Dataset Splits | No | The paper describes experimental runs (e.g., '40 episodes int environment repeated for 20 different random seeds') but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like DDPG, TD3, PPO, and Open AI gym, but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The performance is measured on 40 episodes int environment repeated for 20 different random seeds (nuance represents the 95% non-parametric c.i.). Then, we evaluated the performance of these experts with noise injection with Gaussian noise with different standard deviations. Details can be found in Appendix. |