Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Point at the Triple: Generation of Text Summaries from Knowledge Base Triples

Authors: Pavlos Vougiouklis, Eddy Maddalena, Jonathon Hare, Elena Simperl

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We undertake an automatic and a human evaluation on single and open-domain summaries generation tasks. Both show that our approach signiﬁcantly outperforms other data-driven baselines. We evaluate our approach in two ways: automatically and manually. For the former, we use the BLEU, ROUGE and METEOR metrics in order to evaluate the performance of our approach in both the widely cited task of biographies generation (...) and the generation of open-domain Wikipedia summaries. Furthermore, we run a user study in which we explore the ﬂuency and coverage of the summaries, as well as the presence of contradictions. In all scenarios, our systems outperform a variety of competing baselines of diﬀerent natures.
Researcher Affiliation	Academia	Pavlos Vougiouklis EMAIL School of Electronics and Computer Science University of Southampton Southampton, United Kingdom Eddy Maddalena EMAIL King s College London London, United Kingdom Jonathon Hare EMAIL School of Electronics and Computer Science University of Southampton Southampton, United Kingdom Elena Simperl EMAIL King s College London London, United Kingdom
Pseudocode	No	The paper describes the model architecture and mathematical formulations (e.g., equations 1-12) but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our dataset along with the code of our systems is available at: https://github.com/pvougiou/Point-at-the-Triple. We make the Full corpus along with the code of our system available on Git Hub7.
Open Datasets	Yes	We train and evaluate our system on two corpora. The ﬁrst is the D1 Biographies corpus provided by Vougiouklis et al. (2018a), which consists of triples from DBpedia aligned with Wikipedia biographies. The second corpus, which we refer to as the Full corpus, has been built for the purpose of this paper. It uses the same methodology as the one described by Vougiouklis et al., applied, however, to the entire Wikipedia. We extracted DBpedia triples from the Mapping-based Objects4 and Literals4 DBpedia datasets. All relevant Wikipedia summaries were extracted using the Long Abstracts4 DBpedia dataset retaining only articles with at least a single triple in the Mapping-based Objects and Literals corpora.
Dataset Splits	Yes	Both datasets were split into training, validation and test, corresponding to 85%, 10%, and 5% of the data.
Hardware Specification	Yes	All systems were trained on a single Titan X (Pascal) GPU. The pointer-generator networks completed an epoch of training in around 36 minutes when trained on biographies, and in around 2 hours for the Full dataset. With this limitation, the model size of the pointer-generator systems is around 10 GB, which is comparable with the available GPU memory of the Titan X (Pascal) GPU used for the evaluation.
Software Dependencies	No	We implemented our neural network models using Torch6. The paper mentions 'Torch' and provides a URL but does not specify a specific version number for it or any other software dependencies.
Experiment Setup	Yes	On the decoder side, we used a single layer of 500 GRUs, and included the \|X\| = 15k and \|X\| = 17k more frequent tokens (...) In all experiments, we set the dimensionality of the hidden states to m = 500. We initialised all parameters with a random uniform distribution between 0.1 and 0.1, and used batch normalisation before each non-linear activation function and after each fully-connected layer (...) Our training objective was to minimise the sum of the negative log-likelihoods of a mini-batch of 80 predicted summaries. Optimisation was performed using Adam (...) with a learning rate of 5 10 5. An l2 regularisation term of 0.05 over the parameters was also included in the cost function. (...) During testing and evaluation, we did beam-search (...) with a beam size of 8 and retained only the the summary with the highest probability.