Multi-Objective Reinforcement Learning for Designing Ethical Environments
Authors: Manel Rodriguez-Soto, Maite Lopez-Sanchez, Juan A. Rodriguez Aguilar
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate it with a simple example that embeds the moral value of civility. As to future work, we would like to further examine empirically our algorithm in more complex environments. |
| Researcher Affiliation | Academia | Manel Rodriguez-Soto1 , Maite Lopez-Sanchez2 , Juan A. Rodriguez-Aguilar1 1Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain 2Universitat de Barcelona (UB), Barcelona, Spain |
| Pseudocode | Yes | Algorithm 1 Ethical Embedding 1: function EMBEDDING( Ethical MOMDP M = S, A, (R0, RN , RE), T ) 2: Compute P CH(M) the partial convex hull of M for weight vectors w = (1, w E, w E) with w E > 0. 3: Find Π the set of ethical-optimal policies within P by solving Eq. 4. 4: Find a value for w E that satisfies Eq. 5. 5: Return MDP M = S, A, R0 +w E(RN +RE), T . 6: end function |
| Open Source Code | Yes | Programmed in Python. Code available at https://gitlab.iiia.csic.es/Rodriguez/morl-for-ethical-environments. |
| Open Datasets | No | The paper uses a custom-designed environment, 'The Public Civility Game', for illustration, not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes a reinforcement learning environment ('The Public Civility Game') and illustrates its application, but it does not specify traditional training, validation, or test dataset splits as seen in supervised learning tasks. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations. |
| Software Dependencies | No | The paper mentions "Programmed in Python" and uses "Q-Learning", but it does not specify any version numbers for Python or any libraries/frameworks used. |
| Experiment Setup | No | The paper mentions setting up "agent L to learn with Q-Learning" but does not provide specific hyperparameters (e.g., learning rate, discount factor, exploration rate, episodes) or other detailed training configurations for the Q-Learning process. |