A Fast Variational Approach for Learning Markov Random Field Language Models

Authors: Yacine Jernite, Alexander Rush, David Sontag

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we demonstrate the quality of the models learned by our algorithm by applying it to a language modelling task. Additionally we show that this same estimation algorithm can be effectively applied to other common sequence modelling tasks such as part-of-speech tagging.
Researcher Affiliation Collaboration Yacine Jernite JERNITE@CS.NYU.EDU CIMS, New York University, 251 Mercer Street, New York, NY 10012, USA Alexander M. Rush SRUSH@SEAS.HARVARD.EDU Facebook AI Research, 770 Broadway, New York, NY 10003, USA David Sontag DSONTAG@CS.NYU.EDU CIMS, New York University, 251 Mercer Street, New York, NY 10012, USA
Pseudocode Yes Algorithm 1 Tightening the bound; Algorithm 2 Gradient ascent
Open Source Code No The paper states: 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/) and runs on the GPU for efficiency.' This refers to a third-party framework used, not the authors' own source code for their method.
Open Datasets Yes For language modelling we ran experiments on the Penn Treebank (PTB) corpus with the standard language modelling setup: sections 0-20 for training (N = 930k), sections 21-22 for validation (N = 74k) and sections 23-24 (N = 82k) for test.
Dataset Splits Yes For language modelling we ran experiments on the Penn Treebank (PTB) corpus with the standard language modelling setup: sections 0-20 for training (N = 930k), sections 21-22 for validation (N = 74k) and sections 23-24 (N = 82k) for test.
Hardware Specification No The paper states: 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/) and runs on the GPU for efficiency.' This mention of 'the GPU' is too general and does not specify any model or hardware details.
Software Dependencies No The paper mentions 'Our implementation of the algorithm uses the Torch numerical framework (http://torch.ch/)'. However, it does not specify a version number for Torch or any other software dependencies.
Experiment Setup Yes For model parameter optimization (the gradient step in Algorithm 2) we use L-BFGS (Liu & Nocedal, 1989) with backtracking line-search. For tightening the bound (Algorithm 1), we used 200 sub-gradient iterations, each requiring a round of belief propagation. Our sub-gradient rate parameter α was set as α = 103/2t where t is the number of preceding iterations where the dual objective did not decrease.