Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Recommend Quotes for Writing

Authors: Jiwei Tan, Xiaojun Wan, Jianguo Xiao

AAAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment results show that, our proposed approach is appropriate for this task and it outperforms other recommendation methods.
Researcher Affiliation	Academia	Jiwei Tan and Xiaojun Wan and Jianguo Xiao Institute of Computer Science and Technology, Peking University, Beijing 100871, China The MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China EMAIL
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	No	No explicit statement about releasing the authors' own source code or a direct link to it was found. The paper mentions and links to 'Rank Lib' (http://people.cs.umass.edu/~vdang/ranklib.html), but this is a third-party tool used by the authors, not their implementation code.
Open Datasets	Yes	We collected quotes from the website of Library of Quotes1. ... 1http://www.libraryofquotes.com ... In order to get real contexts of the quotes, we collected about 20GB raw texts of e-books from Project Gutenberg2 (Hart 1971) as corpus. ... 2http://www.gutenberg.org
Dataset Splits	Yes	The 64,323 context-quote pairs were randomly split, according to the proportion of 9:1:1, as training set, validation set and test set, respectively.
Hardware Specification	No	No specific hardware details (such as CPU/GPU models, memory, or cloud instance types) used for running the experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions software like "Porter stemmer" and a "learning to rank tool called Rank Lib", but it does not provide specific version numbers for these components, which are required for reproducibility.
Experiment Setup	Yes	The parameters of Rank Lib we use are -norm zscore and metric2t NDCG@5... In our experiments we select the 1000 quotes with largest similarities to the query context as candidate quotes. ... we also randomly sample 4 negative examples for the training data... The dimension of latent semantic vectors is set to 1000. ... The number of topics is set to 1000. ... The dimension of explicit semantic vectors is 202037. ... The dimension of word vectors is set to 500.