reproducibilityindex.ai

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Authors: Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in sparse-reward cartpole and Mu Jo Co environments show that our algorithms can fully exploit beneﬁcial shaping rewards, and meanwhile ignore unbeneﬁcial shaping rewards or even transform them into beneﬁcial ones.
Researcher Affiliation	Collaboration	1Netease Fuxi AI Lab, Netease, Inc., Hangzhou, China 2College of Intelligence and Computing, Tianjin University, Tianjin, China 3School of Computer Science and Technology, University of Science and Technology of China 4Noah s Ark Lab, Huawei, China
Pseudocode	No	The paper states:
Open Source Code	No	The paper does not include any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	We conduct three groups of experiments. The first one is conducted in cartpole... We choose ﬁve Mu Jo Co tasks Swimmer-v2, Hopper-v2, Humanoid-v2, Walker2d-v2, and Half Cheetah-v2 from Open AI Gym-v1 to test our algorithms.
Dataset Splits	No	The paper specifies training steps and evaluation frequency (e.g., "a 20-episode evaluation is conducted every 4, 000 steps"), but it does not provide explicit dataset splits for training, validation, and testing as percentages or sample counts from a fixed dataset, which is common in supervised learning. For the RL environments, evaluation is integrated into the training process.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only mentions general environments like
Software Dependencies	No	The paper mentions using the PPO algorithm as the base learner (
Experiment Setup	Yes	The test of each method contains 1, 200, 000 training steps. During the training process, a 20-episode evaluation is conducted every 4, 000 steps and we record the average steps per episode (ASPE) performance of the tested method at each evaluation point. ... The shaping weights of our methods are initialized to 1.