Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Objective Reinforcement Learning for Designing Ethical Environments
Authors: Manel Rodriguez-Soto, Maite Lopez-Sanchez, Juan A. Rodriguez Aguilar
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate it with a simple example that embeds the moral value of civility. As to future work, we would like to further examine empirically our algorithm in more complex environments. |
| Researcher Affiliation | Academia | Manel Rodriguez-Soto1 , Maite Lopez-Sanchez2 , Juan A. Rodriguez-Aguilar1 1Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain 2Universitat de Barcelona (UB), Barcelona, Spain |
| Pseudocode | Yes | Algorithm 1 Ethical Embedding 1: function EMBEDDING( Ethical MOMDP M = S, A, (R0, RN , RE), T ) 2: Compute P CH(M) the partial convex hull of M for weight vectors w = (1, w E, w E) with w E > 0. 3: Find Π the set of ethical-optimal policies within P by solving Eq. 4. 4: Find a value for w E that satisfies Eq. 5. 5: Return MDP M = S, A, R0 +w E(RN +RE), T . 6: end function |
| Open Source Code | Yes | Programmed in Python. Code available at https://gitlab.iiia.csic.es/Rodriguez/morl-for-ethical-environments. |
| Open Datasets | No | The paper uses a custom-designed environment, 'The Public Civility Game', for illustration, not a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes a reinforcement learning environment ('The Public Civility Game') and illustrates its application, but it does not specify traditional training, validation, or test dataset splits as seen in supervised learning tasks. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations. |
| Software Dependencies | No | The paper mentions "Programmed in Python" and uses "Q-Learning", but it does not specify any version numbers for Python or any libraries/frameworks used. |
| Experiment Setup | No | The paper mentions setting up "agent L to learn with Q-Learning" but does not provide specific hyperparameters (e.g., learning rate, discount factor, exploration rate, episodes) or other detailed training configurations for the Q-Learning process. |