reproducibilityindex.ai

Learning a Concept Hierarchy from Multi-labeled Documents

Authors: Viet-An Nguyen, Jordan L Ying, Philip Resnik, Jonathan Chang

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically the effectiveness of L2H in predicting held-out words and labels for unseen documents.
Researcher Affiliation	Collaboration	Viet-An Nguyen1 , Jordan Boyd-Graber2, Philip Resnik1,3,4, Jonathan Chang5 1Computer Science, 3Linguistics, 4UMIACS Univ. of Maryland, College Park, MD vietan@cs.umd.edu resnik@umd.edu 2Computer Science Univ. of Colorado, Boulder, CO Jordan.Boyd.Graber @colorado.edu 5Facebook Menlo Park, CA jonchang@fb.com
Pseudocode	Yes	Figure 1: Generative process and the plate diagram notation of L2H. 1. Create label graph G and draw a uniform spanning tree T from G ( 2.1) 2. For each node k [1, K] in T (a) If k is the root, draw background topic φk Dir(βu) (b) Otherwise, draw topic φk Dir(βφσ(k)) where σ(k) is node k s parent. 3. For each document d [1, D] having labels ld (a) Deﬁne L0 d and L1 d using T and ld (cf. 2.2) (b) Draw θ0 d Dir(L0 d α) and θ1 d Dir(L1 d α) (c) Draw a stochastic switching variable πd Beta(γ0, γ1) (d) For each token n [1, Nd] i. Draw set indicator xd,n Bern(πd) ii. Draw topic indicator zd,n Mult(θ xd,n d ) iii. Draw word wd,n Mult(φzd,n)
Open Source Code	No	No explicit statement about releasing source code or a link to a code repository for the described methodology was found.
Open Datasets	Yes	Data: We use the text and labels from Gov Track for the 109th through 112th Congresses (2005 2012). Policy Agenda Codebook (http://policyagendas.org/)
Dataset Splits	Yes	For both quantitative tasks, we perform 5-fold cross-validation.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory specifications, or cloud instances) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions general software like 'M3L' and 'TF-IDF' but does not provide specific version numbers for any libraries, frameworks, or languages used in the implementation.
Experiment Setup	Yes	We run for 1,000 iterations on the training data with a burn-in period of 500 iterations. After the burn-in period, we store ten sets of estimated parameters, one after every ﬁfty iterations. During test time, we run ten chains using these ten learned models on the test data and compute the perplexity after 100 iterations.