Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mitigating Adversarial Norm Training with Moral Axioms

Authors: Taylor Olson, Kenneth D. Forbus

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach by drawing inspiration from a study commonly used in moral development research. This questionnaire aims to test an agent's ability to reason to moral conclusions despite opposed testimony. Our findings suggest that our model can still correctly evaluate moral situations and learn conventions in an adversarial training environment.
Researcher Affiliation	Academia	Taylor Olson, Kenneth D. Forbus Northwestern University EMAIL, EMAIL
Pseudocode	Yes	Definition (CPI: Conditional Principle of Inheritance). If an agent knows that a conjunction (world) is obligatory given certain contextual preconditions, then the agent knows that every more general conjunction (world) is also obligatory in all more specific contexts. (<== (knows Eval Agent Mt y c OBL) (knows Eval Agent Mt x b OBL) (implies x y) (implies c b))
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We obtained 133 action descriptions of transgressions paired with their domain type (moral vs conventional) from multiple MCT studies (Aharoni et al. 2011; Kagan and Lamb 1990 Tables 4.2-6).
Dataset Splits	No	The paper mentions a training phase ('MCT-Training') and a testing phase ('MCT-Testing') but does not explicitly define or refer to a separate validation dataset split.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions 'FIRE reasoning engine (Forbus et al. 2010)' and 'CNLU (Tomai and Forbus 2009)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The default belief threshold is 0.9. To test the models, we draw upon various instances of the Moral-Conventional Transgressions (MCT) task (Sousa 2009). This task is used in moral development research to test, among others, four important dimensions of norms: permissibility, seriousness, authority contingency, and generality.