Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Conceptual Labeling of a Bag of Words

Authors: Xiangyan Sun, Yanghua Xiao, Haixun Wang, Wei Wang

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on both synthetic data and real data. We also present case studies to verify the rationality of our approach.
Researcher Affiliation	Collaboration	School of Computer Science, Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China Google Research, USA
Pseudocode	No	The paper describes the search strategy using prose in Section 3.6, but it does not present it in a structured pseudocode or algorithm block.
Open Source Code	No	The paper states 'Probase data is available at http://probase.msra.cn/dataset.aspx', which refers to a dataset used, not the open-source code for the methodology described in the paper. No explicit statement about making the authors' code available is found.
Open Datasets	Yes	In this paper, we use Probase2 to provide us ﬁne-grained concepts and their statistics. Probase is acquired from 1.68 billion web pages. It extracts is A relations from sentences matching Hearst patterns [Hearst, 1992]. The core version of Probase contains 3,024,814 unique concepts, 6,768,623 unique instances, and 29,625,920 is A relations. Probase data is available at http://probase.msra.cn/dataset.aspx
Dataset Splits	No	The paper mentions generating 't = 1000 bags of words for evaluation' for synthetic data and manually evaluating '100 test cases randomly selected' from real data, but it does not provide specific train/validation/test dataset split percentages, absolute sample counts for each split, or detailed splitting methodology.
Hardware Specification	No	The paper does not specify the hardware used for running experiments (e.g., CPU or GPU models, memory, or cloud computing infrastructure details).
Software Dependencies	No	The paper mentions using 'LDA[Blei et al., 2003]' but does not provide specific version numbers for this or any other software dependencies used in their experiments.
Experiment Setup	Yes	In Section 3.5, we introduce an additional parameter α to adjust the tradeoff between coverage and minimality. By default α = 0.5. A larger α value indicates the description length of concepts are weighted higher than input words, thus fewer concepts will be generated, vice versa. In Section 4.1, it describes varying parameters `nc`, `ni`, and `nn` to guide the generation process of synthetic data.