reproducibilityindex.ai

Collective Supervision of Topic Models for Predicting Surveys with Social Media

Authors: Adrian Benton, Michael Paul, Braden Hancock, Mark Dredze

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce and explore a variety of topic model variants and provide an empirical analysis, with conclusions of the most effective models for this task.
Researcher Affiliation	Academia	Adrian Benton Center for Language and Speech Processing Johns Hopkins University, Baltimore, MD 21218 adrian@cs.jhu.edu Michael J. Paul College of Media, Communication, and Information University of Colorado, Boulder, CO 80309 tmpaul@colorado.edu Braden Hancock Department of Electrical Engineering Stanford University, Stanford, CA 94305 braden.hancock@stanford.edu Mark Dredze Human Language Technology Center of Excellence Johns Hopkins University, Baltimore, MD 21211 mdredze@cs.jhu.edu
Pseudocode	Yes	Figure 2: Generative story for the various upstream models. 1. For each document m: (a) ycm is the feature value associated with the document’s collection cm (b) αm ∼ N(ycm, σ2 α) (adaptive version) or αm = ycm (standard version) (c) θmk = exp(bk + αmηk), for each topic k (d) θm ∼ Dirichlet(θm) 2. For each topic k: (a) φkv = exp(bv + ωvηk), for each word v (words version) or φkv = exp(bv) (standard version) (b) φk ∼ Dirichlet(φk) 3. For each token n in each document m: (a) Sample topic index zmn ∼ θm (b) Sample word token wmn ∼ φzmn
Open Source Code	No	The footnote states: 'Exact values, as well as our datasets can be found at https://github.com/abenton/collsuptmdata'. This explicitly mentions datasets and results, but does not unambiguously state that the source code for the methodology is available.
Open Datasets	Yes	Exact values, as well as our datasets can be found at https://github.com/abenton/collsuptmdata
Dataset Splits	Yes	Average root mean-squared error (RMSE) was computed using five-fold cross-validation: 80% of the 50 U.S. states were used to train, 10% to tune the ℓ2 regularization coefficient, and 10% were used for evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions tools like 'Spearmint' and 'Ada Grad' but does not specify software dependencies with version numbers (e.g., Python 3.x, specific library versions).
Experiment Setup	Yes	For tuning, we held out 10,000 tweets from the guns dataset and used the best parameters for all datasets. We ran Spearmint (Snoek, Larochelle, and Adams 2012) for 100 iterations to tune the learning parameters, running each sampler for 500 iterations. Spearmint was used to tune the following learning parameters: the initial value for b, and the variance of the Gaussian regularization on b, η, ω, α, and y (in the downstream model). Once tuned, all models were trained for 2000 iterations, using Ada Grad (Duchi, Hazan, and Singer 2011) with a master step size of 0.02. For both perplexity and prediction performance, we sweep over number of topics in {10, 25, 50, 100} and report the best result. We swept over the ℓ2 regularization coefficient.