Discovering Better AAAI Keywords via Clustering with Community-Sourced Constraints

Authors: Kelly Moran, Byron Wallace, Carla Brodley

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we solicited feedback from seven AAAI PC members regarding a previously existing keyword set and used these communitysourced constraints to inform a clustering over the abstracts of all submissions to AAAI 2013. We show that the keywords discovered via this data-driven, human-inthe-loop method are at least as preferred (by AAAI PC members) as 2013 s manually generated set, and that they include categories previously overlooked by organizers.
Researcher Affiliation Collaboration Kelly Moran Department of Computer Science Tufts University khmoran@google.com Byron C. Wallace Health Services Policy and Practice Brown University byron wallace@brown.edu Carla E. Brodley Department of Computer Science Tufts University brodley@cs.tufts.edu
Pseudocode No The paper describes its methodology in text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states that 'Jingjing Liu provided the code for the constrained spectral clustering' but does not explicitly state that the authors' implementation of their full methodology is open-source or publicly available.
Open Datasets Yes We have placed the 2013 and 2014 data in the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/).
Dataset Splits Yes We performed five-fold cross validation five times for each value of k in the identified range and averaged the log likelihoods for the held-out documents.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU/GPU models, memory, or computing infrastructure) used to run the experiments.
Software Dependencies No The paper mentions various algorithms and models (e.g., Naive Bayes, Spectral Clustering, SVM, Latent Dirichlet Allocation) but does not specify the software implementations or version numbers of any libraries, frameworks, or solvers used for the experiments.
Experiment Setup Yes We set λ using the data by selecting a value that maximized the estimated log-likelihood of heldout documents under a simple generative model... The best value under this criterion for λ was 2. This procedure suggested 21 as best value for k within the range of interest.