Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Topic Models by Neighborhood Aggregation

Authors: Ryohei Hisano

IJCAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that our approach outperforms the state-of-the-art supervised Latent Dirichlet Allocation implementation in terms of held-out document classiﬁcation tasks. We conduct experiments showing the validity of our approach. We use three datasets in our experiments.
Researcher Affiliation	Academia	Ryohei Hisano Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan EMAIL
Pseudocode	No	The paper describes equations and procedures in text but does not include any formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide an explicit statement or link to its own open-source code for the described methodology.
Open Datasets	Yes	The economic watcher survey, in the table abbreviated as EWS, is a dataset provided by the Cabinet Ofﬁce of Japan 1. The whole dataset is available at http://www5.cao.go.jp/keizai3/ watcher index.html. Amazon review data are a dataset of gathered ratings and review information [Mc Auley et al., 2015]... The whole dataset is available at http://jmcauley.ucsd.edu/data/ amazon/. Subjectivity data are a dataset provided by [Pang and Lee, 2004]... The whole dataset is available at http://ws.cs.cornell.edu/ people/pabo/movie-review-data.
Dataset Splits	Yes	We randomly sample 5000 records for training, development, and testing. Parameters (e.g., the number of hidden units) of these models was found by utilizing the development dataset. We focus on snippets that have more than nine words and sample 1000 snippets each for training, development and testing6.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies	No	The paper mentions software like 'word2vec vectors' and 'standard morphological analysis software' but does not provide specific version numbers for these or any other key software components.
Experiment Setup	Yes	For the regularization parameter governing WS and WC, we set it to 0.001, and for the output function, we set a dropout probability of 0.5 for regularization. We also set η in our model to be 0.2 for the economic watcher survey and 0.05 for the rest, and set the number of hidden units in Eq.(7) to be H1 = 50 and H2 = 50. We also fix the number of topics to 20 for all experiments performed in this section.