reproducibilityindex.ai

Markov Constraint as Large Language Model Surrogate

Authors: Alexandre Bonlarron, Jean-Charles Régin

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.
Researcher Affiliation	Academia	Universit e Cˆote d Azur, Inria, France 2Universit e Cˆote d Azur, CNRS, I3S, France
Pseudocode	No	The paper describes algorithms and filtering criteria in prose and through diagrams, but it does not include a formal pseudocode block or algorithm listing.
Open Source Code	No	The approach described in Sec. 3 is implemented in Java 17. The code is available upon request.
Open Datasets	No	The paper mentions using "n-grams extracted from french books" and references a GPT model from HuggingFace, but it does not provide concrete access information (link, citation, repository) to the specific corpus of French books used to extract the n-grams for their experiments.
Dataset Splits	No	The paper does not provide specific training, validation, and test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility. It discusses
Hardware Specification	Yes	Generation: The generation experiments were performed on a machine using an Intel(R) Xeon(R) W-2175 CPU @ 2.50GHz with 256 GB of RAM and running under Ubuntu 18.04. Inference: The LLM inference experiments were performed on a machine using an AMD EPYC 7313 16-Core CPU @ 3GHz with 512 GB of RAM and an A100 GPU running under Ubuntu 20.04.6 LTS.
Software Dependencies	Yes	The approach described in Sec. 3 is implemented in Java 17.
Experiment Setup	Yes	The paper details various filtering criteria such as Instant Threshold, Gliding Threshold (with parameter λ), and Look-a-head Filtering. It also explains how the threshold T is defined using mean (µ) and standard deviation (σ) of the n-gram distribution, and introduces `Cslack` and `λ` factors for fine-tuning. Tables 2, 3, and 4 present results for different values of λ.