Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Constraint-based Diversification of JOP Gadgets

Authors: Rodothea Myrsini Tsoupidi, Roberto Castañeda Lozano, Benoit Baudry

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Div Con with 20 functions from a popular benchmark suite for embedded systems. These experiments show that Div Con s combination of LNS and our applicationspeciﬁc distance measure generates binary programs that are highly resilient against JOP attacks (they share between 0.15% to 8% of JOP gadgets) with an optimality gap of 10%. Our results conﬁrm that there is a trade-oﬀbetween the quality of each assembly code version and the diversity of the entire pool of versions.
Researcher Affiliation	Academia	Rodothea Myrsini Tsoupidi EMAIL Royal Institute of Technology, KTH, Stockholm, Sweden Roberto Casta neda Lozano EMAIL University of Edinburgh, Edinburgh, United Kingdom Benoit Baudry EMAIL Royal Institute of Technology, KTH, Stockholm, Sweden
Pseudocode	Yes	Algorithm 1: Incremental algorithm for generating diverse solutions Algorithm 2: Decomposition-based incremental algorithm for generating diverse solutions
Open Source Code	Yes	To summarize, the main contributions of this paper are: ... and a publicly available tool for constraint-based software diversiﬁcation1. Footnote 1: https://github.com/romits800/divcon
Open Datasets	Yes	We evaluate the ability of Div Con to generate program variants with 20 functions sampled randomly from Media Bench4 (Lee et al., 1997).
Dataset Splits	No	The paper describes using benchmark functions from Media Bench to evaluate the code diversification technique. However, it does not mention training/test/validation dataset splits, as the experiments involve generating code variants for these functions rather than training a machine learning model on a dataset that would typically require such splits.
Hardware Specification	Yes	Host platform. All experiments run on an Intel R Core TMi9-9920X processor at 3.50GHz with 64GB of RAM running Debian GNU/Linux 10 (buster).
Software Dependencies	Yes	Div Con relies on Unison s solver portfolio that includes Gecode v6.2 (Gecode Team, 2020) and Chuﬀed v0.10.3 (Chu, 2011) to ﬁnd optimal binary programs.
Experiment Setup	Yes	The experiments focus on speed optimization and aim to generate 200 variants within a timeout. Parameter h in Algorithms 1 and 2 is set to one... LNS uses restart-based search with a limit of 1000 failures and a relax rate of 60%... The relax rate is selected empirically based on preliminary experiments (Appendix A). ... The time limit for this experiment is 20 minutes.