Algorithms for Generalized Topic Modeling
Authors: Avrim Blum, Nika Haghtalab
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work we consider a broad generalization of the traditional topic modeling framework, where we no longer assume that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this from unlabeled data only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. |
| Researcher Affiliation | Academia | Avrim Blum Toyota Technological Institute at Chicago avrim@ttic.edu Nika Haghtalab Computer Science Department Carnegie Mellon University nhaghtal@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 ALGORITHM FOR GENERALIZED TOPIC MODELS NO NOISE Algorithm 2 ALGORITHM FOR GENERALIZED TOPIC MODELS WITH NOISE |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets. It discusses how its model relates to 'standard datasets' (e.g., mentioning Blei, Ng, and Jordan 2003) but does not use them for empirical evaluation within this paper. |
| Dataset Splits | No | The paper is theoretical and does not present empirical experiments with dataset splits. Therefore, no validation split information is provided. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and proofs; it does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies or versions required for implementation or experiments. |
| Experiment Setup | No | The paper is theoretical and does not detail an experimental setup with hyperparameters or training configurations. |