Addressing Action Oscillations through Learning Policy Inertia
Authors: Chen Chen, Hongyao Tang, Jianye Hao, Wulong Liu, Zhaopeng Meng7020-7027
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a collection of autonomous driving tasks and several Atari games suggest that our approach demonstrates substantial oscillation reduction in comparison to a range of commonly adopted baselines with almost no performance degradation. |
| Researcher Affiliation | Collaboration | Chen Chen1 , Hongyao Tang2,1 , Jianye Hao1,2 , Wulong Liu1, Zhaopeng Meng2 1Noah s Ark Lab, Huawei 2College of Intelligence and Computing, Tianjin University |
| Pseudocode | Yes | Algorithm 1 Nested Policy Iteration (NPI) for PICaugmented Policy |
| Open Source Code | No | The paper does not provide any specific statement or link indicating that the source code for their methodology is open-source or publicly available. |
| Open Datasets | Yes | We use the Highway simulator which includes a collection of autonomous driving scenarios, as well as several Atari games in Open AI-Gym in our experiments. Highway environments are originally provided at https:// github.com/eleurent/highway-env. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (percentages, sample counts, or explicit cross-validation details) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Highway simulator' and 'Open AI-Gym' but does not specify version numbers for these or any other software dependencies, libraries, or frameworks. |
| Experiment Setup | Yes | We train five different instances of each algorithm with different random seeds, with each performing 20 evaluation rollouts with some other seed every 5000 environment steps. |