reproducibilityindex.ai

Deep Context: A Neural Language Model for Large-scale Networked Documents

Authors: Hao Wu, Kristina Lerman

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on large-scale data collections that include Wikipedia pages, and scientiﬁc and legal citations networks. We demonstrate its effectiveness and efﬁciency on document classiﬁcation and link prediction tasks.
Researcher Affiliation	Academia	Hao Wu USC ISI hwu732@usc.edu Kristina Lerman USC ISI lerman@isi.edu
Pseudocode	No	The paper describes algorithms textually and through mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any links or explicit statements about the availability of its source code.
Open Datasets	Yes	Wikipedia: A dump of Wikipedia pages1 in October 2015 is used in our experiments. 1http://dumps.wikimedia.org/enwiki/latest/ enwiki-latest-pages-articles.xml.bz2 DBLP: We download the DBLP data set [Tang et al., 2008]2, which contains a collection of papers with titles and citation links. 2http://arnetminer.org/lab-datasets/ citation/DBLP_citation_2014_May.zip Legal: We collect a large digitized record of federal court opinions from the Court Listener3 project in our study. 3https://www.courtlistener.com/
Dataset Splits	Yes	The results are average over 5-fold cross-validation on the sampled data.
Hardware Specification	Yes	We perform experiments on a single machine with 64 CPU cores at 2.3 GHz, and 256G memory.
Software Dependencies	No	The paper mentions "Asynchronous stochastic gradient descent algorithm is used with 40 threads to optimize our models" but does not specify software names with version numbers for reproducibility.
Experiment Setup	Yes	The dimensionality of word and document vectors are ﬁxed as 400 for all learning models. The number of negative sampling is ﬁxed as 5 for Skip-gram, PV, LINE and DCV. We set the word context window size n = 5 in DCV-v LBL and b = 5 in DCV-iv LBL. The context window of document sequence m is ﬁxed as 1 by which we only consider the immediate neighbors that the current document links to.