Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Association Network for Computing Semantic Relatedness

Authors: Keyang Zhang, Kenny Zhu, Seung-won Hwang

AAAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation results validate that simple algorithms on this network give competitive results in computing semantic relatedness between words and between short texts.This section primarily evaluates two association networks, one constructed only using the original free association norms (denoted as ANfree), and the other constructed through the approach proposed in Section (denoted as ANwiki).
Researcher Affiliation	Academia	Keyang Zhang 1 and Kenny Q. Zhu 2 Shanghai Jiao Tong University, Shanghai, China EMAIL, EMAIL Seung-won Hwang POSTECH, Pohang, Republic of Korea EMAIL
Pseudocode	Yes	Algorithm 1 Generate super node
Open Source Code	No	3A demo of our system is available at http://adapt.seiee.sjtu. edu.cn/ keyang/assoc/.
Open Datasets	Yes	The original Florida free association norms data contains 5,019 cue words (which form the set of normed words) and a total of 72,176 cue-response pairs. ... known as Florida Norms from now on. Our test set for evaluting term relatedness is the well-known Word Similarity-353 (Finkelstein et al. 2002) (a.k.a. WS-353 with 353 word pairs) For testing short text similarity, we use the well-known public set Li30 (Li et al. 2006), comprising 30 pairs of short texts. A newly constructed dataset STSS-131 (O shea, Bandar, and Crockett 2013) is used to tune the parameter K decribed in Algorithm 2.
Dataset Splits	No	The paper mentions various datasets (Florida Norms, WS-353, Li30, STSS-131) but does not provide specific details on training, validation, and test splits (e.g., percentages, sample counts, or explicit splitting methodologies) for any of them.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	Recall that, Algorithm 2 is parameterized by K determining the extent of expansion. Our reported results use K = 10, empirically tuned based on STSS-131 dataset. As a result, we follow (Wettler 1993) to set α to be 0.66, which, according to them, perform the best in estimating word association.