Discovering Better AAAI Keywords via Clustering with Community-Sourced Constraints
Authors: Kelly Moran, Byron Wallace, Carla Brodley
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we solicited feedback from seven AAAI PC members regarding a previously existing keyword set and used these communitysourced constraints to inform a clustering over the abstracts of all submissions to AAAI 2013. We show that the keywords discovered via this data-driven, human-inthe-loop method are at least as preferred (by AAAI PC members) as 2013 s manually generated set, and that they include categories previously overlooked by organizers. |
| Researcher Affiliation | Collaboration | Kelly Moran Department of Computer Science Tufts University khmoran@google.com Byron C. Wallace Health Services Policy and Practice Brown University byron wallace@brown.edu Carla E. Brodley Department of Computer Science Tufts University brodley@cs.tufts.edu |
| Pseudocode | No | The paper describes its methodology in text and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that 'Jingjing Liu provided the code for the constrained spectral clustering' but does not explicitly state that the authors' implementation of their full methodology is open-source or publicly available. |
| Open Datasets | Yes | We have placed the 2013 and 2014 data in the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/). |
| Dataset Splits | Yes | We performed five-fold cross validation five times for each value of k in the identified range and averaged the log likelihoods for the held-out documents. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU/GPU models, memory, or computing infrastructure) used to run the experiments. |
| Software Dependencies | No | The paper mentions various algorithms and models (e.g., Naive Bayes, Spectral Clustering, SVM, Latent Dirichlet Allocation) but does not specify the software implementations or version numbers of any libraries, frameworks, or solvers used for the experiments. |
| Experiment Setup | Yes | We set λ using the data by selecting a value that maximized the estimated log-likelihood of heldout documents under a simple generative model... The best value under this criterion for λ was 2. This procedure suggested 21 as best value for k within the range of interest. |