Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Chronicling Germany: An Annotated Historical Newspaper Dataset

Authors: Christian Schultze, Niklas Kerkfeld, Kara Kuebart, Princilia Weber, Moritz Wolter, Felix Selgert

DMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The paper presents a processing pipeline and establishes baseline results on in- and out-of-domain test data using this pipeline. Both our dataset and the corresponding baseline code are freely available online. ... 4 Experiments and results ... Table 4: Layout detection test set results. This table lists F1 Score values for all individual classes.
Researcher Affiliation	Academia	Christian Schultze High-Performance Computing and Analytics (HPCA-Lab) Universität Bonn EMAIL; Niklas Kerkfeld HPCA-Lab, Universität Bonn EMAIL; Kara Kuebart Institut für Geschichtswissenschaft Universität Bonn EMAIL; Princilia Weber Institut für Geschichtswissenschaft, Universität Bonn EMAIL; Moritz Wolter HPCA-Lab, Universität Bonn EMAIL; Felix Selgert Institut für Geschichtswissenschaft, Universität Bonn EMAIL
Pseudocode	No	The paper describes methods and processes in text and flowcharts (e.g., Figure 2: Flow chart of the entire prediction pipeline) but does not include any clearly labeled pseudocode or algorithm blocks with structured steps formatted like code.
Open Source Code	Yes	Both our dataset and the corresponding baseline code are freely available online. ... Code: https://github.com/Digital-History-Bonn/Chronicling-Germany-Code
Open Datasets	Yes	The Chronicling Germany dataset contains 801 annotated historical newspaper pages from the time period between 1617 and 1933. ... Both our dataset and the corresponding baseline code are freely available online. ... Data: https://gitlab.uni-bonn.de/digital-history/Chronicling-Germany-Dataset
Dataset Splits	Yes	We split our data into train, validation, and test datasets (Table 3), where the test dataset consists of in Distribution (i D) and Out of Distribution (Oo D). ... Table 3: Dataset split (left), test data is divided into in Distribution (i D)and Out of Distribution (Oo D)-data. ... train 651 ... validation 50 ... test i D 80 ... test Oo D 20
Hardware Specification	Yes	The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gausscentre.eu) for funding this project by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS at Jülich Supercomputing Centre (JSC).
Software Dependencies	No	The paper mentions using specific systems and architectures such as 'U-Nets', 'LSTM network', and the 'Kraken-OCR-engine (Kiessling, 2022)'. However, it does not provide specific version numbers for any software libraries, frameworks, or programming languages used for the implementation described in the paper. The '2022' for Kiessling refers to a publication date, not a software version number.
Experiment Setup	Yes	For layout training... An Adam W-Optimizer (Loshchilov and Hutter, 2017) trains this network with a learning rate of 0.0001, with a weight decay parameter of 0.001 for 50 Epochs in total, while using early stopping to save the best model. ... effective batch size is 128... Following Kodym and Hradis (2021) we train an U-Net for the text-baseline prediction task... We run an Adam W-optimizer (Loshchilov and Hutter, 2017) with a learning rate of 0.0001 and a batch size of 16... Based on the Kraken-OCR-engine... we train a LSTM-cell for the OCR -task... Adam (Kingma and Ba, 2015) optimizes the network with a learning rate of 0.001. Optimization runs for eight epochs with a batch size of 32 sequences.