Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

Authors: Wonje Jeung, Yoon Sangyeon, Minsuk Kahng, Albert No

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results across multiple benchmarks indicate that SAFEPATH effectively reduces harmful outputs while maintaining reasoning performance.
Researcher Affiliation	Academia	Wonje Jeung1 Sangyeon Yoon1 Minsuk Kahng2 Albert No1 1 Department of Artificial Intelligence, Yonsei University 2 Department of Computer Science and Engineering, Yonsei University EMAIL
Pseudocode	No	The paper describes the SAFEPATH method and its training process in detail, including specific instruction formats and logic, but it does not present these steps within a formal pseudocode block or algorithm environment.
Open Source Code	Yes	We release model and code at https://ai-isl.github.io/safepath.
Open Datasets	Yes	We use Wild Jailbreak [Jiang et al., 2024] as the Safety Trigger set and Deep Seek Math 220K [Guo et al., 2025] as the Reasoning Retain set.
Dataset Splits	Yes	The R-7B model is trained on 400 Safety Trigger set samples for 100 steps with a batch size of 4, without using the Reasoning Retain set. The R-8B model is trained on 40 samples from each set (80 total) for 20 steps with a batch size of 4.
Hardware Specification	Yes	All experiments were conducted on a system with 512 CPU cores, 8 Nvidia RTX L40S (48GB) GPUs, and 1024 GB of RAM.
Software Dependencies	No	The paper mentions using tools like 'lm-evaluation-harness' and 'AI2 evaluation codebase' but does not specify their version numbers or other key software dependencies with specific versions.
Experiment Setup	Yes	Both datasets are trained with a learning rate of 1 10 5. The R-7B model is trained on 400 Safety Trigger set samples for 100 steps with a batch size of 4... The R-8B model is trained on 40 samples from each set (80 total) for 20 steps with a batch size of 4.