Learning a Concept Hierarchy from Multi-labeled Documents
Authors: Viet-An Nguyen, Jordan L Ying, Philip Resnik, Jonathan Chang
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically the effectiveness of L2H in predicting held-out words and labels for unseen documents. |
| Researcher Affiliation | Collaboration | Viet-An Nguyen1 , Jordan Boyd-Graber2, Philip Resnik1,3,4, Jonathan Chang5 1Computer Science, 3Linguistics, 4UMIACS Univ. of Maryland, College Park, MD vietan@cs.umd.edu resnik@umd.edu 2Computer Science Univ. of Colorado, Boulder, CO Jordan.Boyd.Graber @colorado.edu 5Facebook Menlo Park, CA jonchang@fb.com |
| Pseudocode | Yes | Figure 1: Generative process and the plate diagram notation of L2H. 1. Create label graph G and draw a uniform spanning tree T from G ( 2.1) 2. For each node k [1, K] in T (a) If k is the root, draw background topic φk Dir(βu) (b) Otherwise, draw topic φk Dir(βφσ(k)) where σ(k) is node k s parent. 3. For each document d [1, D] having labels ld (a) Define L0 d and L1 d using T and ld (cf. 2.2) (b) Draw θ0 d Dir(L0 d α) and θ1 d Dir(L1 d α) (c) Draw a stochastic switching variable πd Beta(γ0, γ1) (d) For each token n [1, Nd] i. Draw set indicator xd,n Bern(πd) ii. Draw topic indicator zd,n Mult(θ xd,n d ) iii. Draw word wd,n Mult(φzd,n) |
| Open Source Code | No | No explicit statement about releasing source code or a link to a code repository for the described methodology was found. |
| Open Datasets | Yes | Data: We use the text and labels from Gov Track for the 109th through 112th Congresses (2005 2012). Policy Agenda Codebook (http://policyagendas.org/) |
| Dataset Splits | Yes | For both quantitative tasks, we perform 5-fold cross-validation. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory specifications, or cloud instances) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions general software like 'M3L' and 'TF-IDF' but does not provide specific version numbers for any libraries, frameworks, or languages used in the implementation. |
| Experiment Setup | Yes | We run for 1,000 iterations on the training data with a burn-in period of 500 iterations. After the burn-in period, we store ten sets of estimated parameters, one after every fifty iterations. During test time, we run ten chains using these ten learned models on the test data and compute the perplexity after 100 iterations. |