Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Markov Constraint as Large Language Model Surrogate
Authors: Alexandre Bonlarron, Jean-Charles Régin
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams. |
| Researcher Affiliation | Academia | Universit e Cˆote d Azur, Inria, France 2Universit e Cˆote d Azur, CNRS, I3S, France |
| Pseudocode | No | The paper describes algorithms and filtering criteria in prose and through diagrams, but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The approach described in Sec. 3 is implemented in Java 17. The code is available upon request. |
| Open Datasets | No | The paper mentions using "n-grams extracted from french books" and references a GPT model from HuggingFace, but it does not provide concrete access information (link, citation, repository) to the specific corpus of French books used to extract the n-grams for their experiments. |
| Dataset Splits | No | The paper does not provide specific training, validation, and test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility. It discusses |
| Hardware Specification | Yes | Generation: The generation experiments were performed on a machine using an Intel(R) Xeon(R) W-2175 CPU @ 2.50GHz with 256 GB of RAM and running under Ubuntu 18.04. Inference: The LLM inference experiments were performed on a machine using an AMD EPYC 7313 16-Core CPU @ 3GHz with 512 GB of RAM and an A100 GPU running under Ubuntu 20.04.6 LTS. |
| Software Dependencies | Yes | The approach described in Sec. 3 is implemented in Java 17. |
| Experiment Setup | Yes | The paper details various filtering criteria such as Instant Threshold, Gliding Threshold (with parameter λ), and Look-a-head Filtering. It also explains how the threshold T is defined using mean (µ) and standard deviation (σ) of the n-gram distribution, and introduces `Cslack` and `λ` factors for fine-tuning. Tables 2, 3, and 4 present results for different values of λ. |