Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Vocabulary Alignment in Openly Specified Interactions

Authors: Paula Daniela Chocron, Marco Schorlemmer

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present two techniques that can be used either to learn an alignment from scratch or to repair an existent one, and we evaluate their performance experimentally.
Researcher Affiliation	Academia	Paula Chocron EMAIL Artiﬁcial Intelligence Research Institute, IIIA-CSIC Bellaterra (Barcelona), Catalonia, Spain Universitat Autònoma de Barcelona Bellaterra (Barcelona), Catalonia, Spain Marco Schorlemmer EMAIL Artiﬁcial Intelligence Research Institute, IIIA-CSIC Bellaterra (Barcelona), Catalonia, Spain
Pseudocode	No	The paper describes methods and techniques in prose and mathematical notation (e.g., definitions, equations) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific links to source code repositories, an explicit statement of code release, or mention code in supplementary materials for the methodology described.
Open Datasets	No	We evaluate the techniques that we propose with randomly generated data, which allows us to abstract away from any implementation details.
Dataset Splits	No	A run of an experiment consists of two agents a1 and a2 with vocabularies V1 and V2 who are sequentially given pairs of protocols compatible under one same translation τ. ... We let agents interact 300 times. After each interaction, we measured how close agents were to the correct translation τ.
Hardware Specification	No	The vocabularies of 8 words that we used in the experiments were the larger ones for which the technique could be executed on our server. After that, it became prohibitively space-consuming, and was automatically killed. (No specific hardware details are provided, only a general reference to 'our server').
Software Dependencies	No	We used the Nu SMV model checker (Cimatti et al., 2002) to perform all the necessary satisﬁability checks. (The tool Nu SMV is mentioned, but no version number is specified.)
Experiment Setup	Yes	A run of an experiment consists of two agents a1 and a2 with vocabularies V1 and V2 who are sequentially given pairs of protocols compatible under one same translation τ. ... We let agents interact 300 times. After each interaction, we measured how close agents were to the correct translation τ. ... We used the values r = 0.3 for the punishment parameter of the simple strategy. Each experiment was repeated 10 times, and we averaged the results. ... We show here the experiments for a vocabulary of 12 words and four diﬀerent protocol sizes.