Multi-Objective Reinforcement Learning for Designing Ethical Environments

Authors: Manel Rodriguez-Soto, Maite Lopez-Sanchez, Juan A. Rodriguez Aguilar

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate it with a simple example that embeds the moral value of civility. As to future work, we would like to further examine empirically our algorithm in more complex environments.
Researcher Affiliation Academia Manel Rodriguez-Soto1 , Maite Lopez-Sanchez2 , Juan A. Rodriguez-Aguilar1 1Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain 2Universitat de Barcelona (UB), Barcelona, Spain
Pseudocode Yes Algorithm 1 Ethical Embedding 1: function EMBEDDING( Ethical MOMDP M = S, A, (R0, RN , RE), T ) 2: Compute P CH(M) the partial convex hull of M for weight vectors w = (1, w E, w E) with w E > 0. 3: Find Π the set of ethical-optimal policies within P by solving Eq. 4. 4: Find a value for w E that satisfies Eq. 5. 5: Return MDP M = S, A, R0 +w E(RN +RE), T . 6: end function
Open Source Code Yes Programmed in Python. Code available at https://gitlab.iiia.csic.es/Rodriguez/morl-for-ethical-environments.
Open Datasets No The paper uses a custom-designed environment, 'The Public Civility Game', for illustration, not a publicly available dataset with concrete access information.
Dataset Splits No The paper describes a reinforcement learning environment ('The Public Civility Game') and illustrates its application, but it does not specify traditional training, validation, or test dataset splits as seen in supervised learning tasks.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations.
Software Dependencies No The paper mentions "Programmed in Python" and uses "Q-Learning", but it does not specify any version numbers for Python or any libraries/frameworks used.
Experiment Setup No The paper mentions setting up "agent L to learn with Q-Learning" but does not provide specific hyperparameters (e.g., learning rate, discount factor, exploration rate, episodes) or other detailed training configurations for the Q-Learning process.