reproducibilityindex.ai

Multi-Objective Reinforcement Learning for Designing Ethical Environments

Authors: Manel Rodriguez-Soto, Maite Lopez-Sanchez, Juan A. Rodriguez Aguilar

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate it with a simple example that embeds the moral value of civility. As to future work, we would like to further examine empirically our algorithm in more complex environments.
Researcher Affiliation	Academia	Manel Rodriguez-Soto1 , Maite Lopez-Sanchez2 , Juan A. Rodriguez-Aguilar1 1Artiﬁcial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain 2Universitat de Barcelona (UB), Barcelona, Spain
Pseudocode	Yes	Algorithm 1 Ethical Embedding 1: function EMBEDDING( Ethical MOMDP M = S, A, (R0, RN , RE), T ) 2: Compute P CH(M) the partial convex hull of M for weight vectors w = (1, w E, w E) with w E > 0. 3: Find Π the set of ethical-optimal policies within P by solving Eq. 4. 4: Find a value for w E that satisﬁes Eq. 5. 5: Return MDP M = S, A, R0 +w E(RN +RE), T . 6: end function
Open Source Code	Yes	Programmed in Python. Code available at https://gitlab.iiia.csic.es/Rodriguez/morl-for-ethical-environments.
Open Datasets	No	The paper uses a custom-designed environment, 'The Public Civility Game', for illustration, not a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes a reinforcement learning environment ('The Public Civility Game') and illustrates its application, but it does not specify traditional training, validation, or test dataset splits as seen in supervised learning tasks.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations.
Software Dependencies	No	The paper mentions "Programmed in Python" and uses "Q-Learning", but it does not specify any version numbers for Python or any libraries/frameworks used.
Experiment Setup	No	The paper mentions setting up "agent L to learn with Q-Learning" but does not provide specific hyperparameters (e.g., learning rate, discount factor, exploration rate, episodes) or other detailed training configurations for the Q-Learning process.