reproducibilityindex.ai

Spectral Label Refinement for Noisy and Missing Text Labels

Authors: Yangqiu Song, Chenguang Wang, Ming Zhang, Hailong Sun, Qiang Yang

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the label reﬁning algorithm on eight labeled document datasets, and validate that the results are useful for generating better labels. Experiments conducted on eight real world datasets have shown its power in following three aspects.
Researcher Affiliation	Academia	Yangqiu Songa Chenguang Wangb Ming Zhangb Hailong Sunc Qiang Yangd a University of Illinois at Urbana-Champaign b Peking University c Beihang University d Hong Kong University of Science and Technology
Pseudocode	Yes	Algorithm 1 DLSR-based Label Reﬁnement Algorithm
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	To evaluate our algorithm, we use eight text classiﬁcation datasets that containing the ground truth labels. Speciﬁcally, we use the datasets presented in (Zhong and Ghosh 2005), which are the 20-newsgroups data and the sets from the CLUTO toolkit (Karypis 2002). Eight subsets are selected to test our algorithm, which are summarized in Table 1. The ohscal dataset is from OHSUMED colletion (Hersh et al. 1994). Datasets tr11, tr12, tr23, tr31, tr41 and tr45 are from TREC collections3.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into train/validation/test sets.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'Bow toolkit (Mc Callum 1996)' and 'CLUTO toolkit (Karypis 2002)' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup	Yes	For example, the noise rate 40% represents that we randomly select 40% of the true labels and randomly permute these labels. Here, we set the noise rates as 0%, 20%, 40% and 60%. We set a = 1 and b = 0.001 (deﬁned in Deﬁnition 3) for this experiment. All the data are computed using normalized TF-IDF feature. The neighborhood number to construct the content based neighborhood graphs for all the graph based algorithms is empirically set to 10.