Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Context: A Neural Language Model for Large-scale Networked Documents
Authors: Hao Wu, Kristina Lerman
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on large-scale data collections that include Wikipedia pages, and scientific and legal citations networks. We demonstrate its effectiveness and efficiency on document classification and link prediction tasks. |
| Researcher Affiliation | Academia | Hao Wu USC ISI EMAIL Kristina Lerman USC ISI EMAIL |
| Pseudocode | No | The paper describes algorithms textually and through mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of its source code. |
| Open Datasets | Yes | Wikipedia: A dump of Wikipedia pages1 in October 2015 is used in our experiments. 1http://dumps.wikimedia.org/enwiki/latest/ enwiki-latest-pages-articles.xml.bz2 DBLP: We download the DBLP data set [Tang et al., 2008]2, which contains a collection of papers with titles and citation links. 2http://arnetminer.org/lab-datasets/ citation/DBLP_citation_2014_May.zip Legal: We collect a large digitized record of federal court opinions from the Court Listener3 project in our study. 3https://www.courtlistener.com/ |
| Dataset Splits | Yes | The results are average over 5-fold cross-validation on the sampled data. |
| Hardware Specification | Yes | We perform experiments on a single machine with 64 CPU cores at 2.3 GHz, and 256G memory. |
| Software Dependencies | No | The paper mentions "Asynchronous stochastic gradient descent algorithm is used with 40 threads to optimize our models" but does not specify software names with version numbers for reproducibility. |
| Experiment Setup | Yes | The dimensionality of word and document vectors are fixed as 400 for all learning models. The number of negative sampling is fixed as 5 for Skip-gram, PV, LINE and DCV. We set the word context window size n = 5 in DCV-v LBL and b = 5 in DCV-iv LBL. The context window of document sequence m is fixed as 1 by which we only consider the immediate neighbors that the current document links to. |