A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents

Authors: Yueh-Hua Wu, Shou-De Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of ethics shaping by conducting experiments in three scenarios, Grab a Milk, Driving and Avoiding, and Driving and Rescuing. These schemes are designed to show how the learner s behavior could be altered by ethics shaping while facing matters happening in our daily lives. We further claim that ethics shaping ought to overcome or alleviate ethical problems such as side effects caused by optimizing the original objective functions (Taylor et al. 2016) and dangerous exploration (Amodei et al. 2016), which will be confirmed by the experiment results.
Researcher Affiliation Academia Yueh-Hua Wu, Shou-De Lin Department of Computer Science and Information Engineering, National Taiwan University Taipei 10617, Taiwan d06922005@ntu.edu.tw, sdlin@csie.ntu.edu.tw
Pseudocode No The paper describes the 'Ethics Shaping' method using mathematical equations (e.g., equation 6 for shaping reward) but does not present the method in a structured pseudocode or algorithm block.
Open Source Code No The paper does not include any explicit statements about releasing source code or provide links to a code repository for the methodology described.
Open Datasets No The paper describes three custom scenarios ('Grab a Milk', 'Driving and Avoiding', 'Driving and Rescuing') and mentions human trajectories 'synthesized by random walk'. It does not refer to any established public datasets or provide access information (links, DOIs, citations with author/year) for any data used.
Dataset Splits No The paper mentions that 'the best performances are reported in terms of learning rate α, discount factor γ, and the scale parameters cn, cp in shaping reward H', which implies hyperparameter tuning. However, it does not explicitly state dataset splits (e.g., train/validation/test percentages or counts) or cross-validation setup for reproducibility of data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions that the 'ethics shaping algorithm can make the SARSA algorithm perform more ethically' and that 'ϵ-greedy is used for exploration', but it does not list any specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch, or their specific versions).
Experiment Setup Yes For algorithms with and without ethics shaping, the best performances are reported in terms of learning rate α, discount factor γ, and the scale parameters cn, cp in shaping reward H. ... In the experiments, all human policies are synthesized by random walk with ethical rules and the confidence level of human feedback C is set to 0.95 since we would like to focus on how much ethics shaping can influence reinforcement learners.