reproducibilityindex.ai

Discovering Better AAAI Keywords via Clustering with Community-Sourced Constraints

Authors: Kelly Moran, Byron Wallace, Carla Brodley

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we solicited feedback from seven AAAI PC members regarding a previously existing keyword set and used these communitysourced constraints to inform a clustering over the abstracts of all submissions to AAAI 2013. We show that the keywords discovered via this data-driven, human-inthe-loop method are at least as preferred (by AAAI PC members) as 2013 s manually generated set, and that they include categories previously overlooked by organizers.
Researcher Affiliation	Collaboration	Kelly Moran Department of Computer Science Tufts University khmoran@google.com Byron C. Wallace Health Services Policy and Practice Brown University byron wallace@brown.edu Carla E. Brodley Department of Computer Science Tufts University brodley@cs.tufts.edu
Pseudocode	No	The paper describes its methodology in text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states that 'Jingjing Liu provided the code for the constrained spectral clustering' but does not explicitly state that the authors' implementation of their full methodology is open-source or publicly available.
Open Datasets	Yes	We have placed the 2013 and 2014 data in the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/).
Dataset Splits	Yes	We performed ﬁve-fold cross validation ﬁve times for each value of k in the identiﬁed range and averaged the log likelihoods for the held-out documents.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., CPU/GPU models, memory, or computing infrastructure) used to run the experiments.
Software Dependencies	No	The paper mentions various algorithms and models (e.g., Naive Bayes, Spectral Clustering, SVM, Latent Dirichlet Allocation) but does not specify the software implementations or version numbers of any libraries, frameworks, or solvers used for the experiments.
Experiment Setup	Yes	We set λ using the data by selecting a value that maximized the estimated log-likelihood of heldout documents under a simple generative model... The best value under this criterion for λ was 2. This procedure suggested 21 as best value for k within the range of interest.