Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Proofs and Certificates for Max-SAT

Authors: Matthieu Py, Mohamed Sami Cherif, Djamal Habet

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both tools are evaluated on the unweighted and weighted benchmark instances of the 2020 Max-SAT Evaluation.
Researcher Affiliation	Academia	Matthieu Py EMAIL Universit e Clermont Auvergne, Mines Saint-Etienne, CNRS, LIMOS, F-63000 Clermont-Ferrand, France Mohamed Sami Cherif EMAIL Djamal Habet EMAIL Aix-Marseille Universit e, Universit e de Toulon, CNRS, LIS, Marseille, France
Pseudocode	Yes	Algorithm 1 MS-Builder Require: Formula φ Ensure: Max-SAT certiﬁcate c for φ... Algorithm 2 MS-Checker Require: Formula φ, Max-SAT certiﬁcate C for φ Ensure: True if C is a valid certiﬁcate for φ, False otherwise
Open Source Code	Yes	We have implemented MS-Builder and MS-Checker in C++2. the source code is available on https://pageperso.lis-lab.fr/matthieu.py/en/software.html
Open Datasets	Yes	Both tools are experimentally evaluated on the unweighted and weighted benchmarks of the 2020 Max-SAT Evaluation (Bacchus et al., 2020).
Dataset Splits	No	The paper mentions using 'the benchmark of the unweighted partial track and of the weighted partial track of the 2020 Max-SAT Evaluation', which refers to a set of instances for solver evaluation, not a dataset with traditional training/test/validation splits. Therefore, no such split information is provided.
Hardware Specification	Yes	The experiments are performed on Dell Power Edge M620 servers with Intel Xeon Silver E5-2609 processors (clocked at 2.5 2.6 GHz) under Ubuntu 18.04. Each solving process is allocated a slot of 1 hour and at most 16 GB of memory per instance.
Software Dependencies	No	Resolution Refutations are computed using Booleforce (Biere, 2010; E en & S orensson, 2003) and Tracecheck (Biere, 2006, 2008). While software tools are named, specific version numbers for Booleforce and Tracecheck are not explicitly provided in the text.
Experiment Setup	Yes	Each solving process is allocated a slot of 1 hour and at most 16 GB of memory per instance.