A Fast Variational Approach for Learning Markov Random Field Language Models
Authors: Yacine Jernite, Alexander Rush, David Sontag
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we demonstrate the quality of the models learned by our algorithm by applying it to a language modelling task. Additionally we show that this same estimation algorithm can be effectively applied to other common sequence modelling tasks such as part-of-speech tagging. |
| Researcher Affiliation | Collaboration | Yacine Jernite JERNITE@CS.NYU.EDU CIMS, New York University, 251 Mercer Street, New York, NY 10012, USA Alexander M. Rush SRUSH@SEAS.HARVARD.EDU Facebook AI Research, 770 Broadway, New York, NY 10003, USA David Sontag DSONTAG@CS.NYU.EDU CIMS, New York University, 251 Mercer Street, New York, NY 10012, USA |
| Pseudocode | Yes | Algorithm 1 Tightening the bound; Algorithm 2 Gradient ascent |
| Open Source Code | No | The paper states: 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/) and runs on the GPU for efficiency.' This refers to a third-party framework used, not the authors' own source code for their method. |
| Open Datasets | Yes | For language modelling we ran experiments on the Penn Treebank (PTB) corpus with the standard language modelling setup: sections 0-20 for training (N = 930k), sections 21-22 for validation (N = 74k) and sections 23-24 (N = 82k) for test. |
| Dataset Splits | Yes | For language modelling we ran experiments on the Penn Treebank (PTB) corpus with the standard language modelling setup: sections 0-20 for training (N = 930k), sections 21-22 for validation (N = 74k) and sections 23-24 (N = 82k) for test. |
| Hardware Specification | No | The paper states: 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/) and runs on the GPU for efficiency.' This mention of 'the GPU' is too general and does not specify any model or hardware details. |
| Software Dependencies | No | The paper mentions 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/)'. However, it does not specify a version number for Torch or any other software dependencies. |
| Experiment Setup | Yes | For model parameter optimization (the gradient step in Algorithm 2) we use L-BFGS (Liu & Nocedal, 1989) with backtracking line-search. For tightening the bound (Algorithm 1), we used 200 sub-gradient iterations, each requiring a round of belief propagation. Our sub-gradient rate parameter α was set as α = 103/2t where t is the number of preceding iterations where the dual objective did not decrease. |