Addressing Action Oscillations through Learning Policy Inertia

Authors: Chen Chen, Hongyao Tang, Jianye Hao, Wulong Liu, Zhaopeng Meng7020-7027

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a collection of autonomous driving tasks and several Atari games suggest that our approach demonstrates substantial oscillation reduction in comparison to a range of commonly adopted baselines with almost no performance degradation.
Researcher Affiliation Collaboration Chen Chen1 , Hongyao Tang2,1 , Jianye Hao1,2 , Wulong Liu1, Zhaopeng Meng2 1Noah s Ark Lab, Huawei 2College of Intelligence and Computing, Tianjin University
Pseudocode Yes Algorithm 1 Nested Policy Iteration (NPI) for PICaugmented Policy
Open Source Code No The paper does not provide any specific statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets Yes We use the Highway simulator which includes a collection of autonomous driving scenarios, as well as several Atari games in Open AI-Gym in our experiments. Highway environments are originally provided at https:// github.com/eleurent/highway-env.
Dataset Splits No The paper does not explicitly provide specific dataset split information (percentages, sample counts, or explicit cross-validation details) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Highway simulator' and 'Open AI-Gym' but does not specify version numbers for these or any other software dependencies, libraries, or frameworks.
Experiment Setup Yes We train five different instances of each algorithm with different random seeds, with each performing 20 evaluation rollouts with some other seed every 5000 environment steps.