Reified Context Models

Authors: Jacob Steinhardt, Percy Liang

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our approach obtains expressivity and coverage on three sequence modeling tasks.
Researcher Affiliation Academia Jacob Steinhardt JSTEINHARDT@CS.STANFORD.EDU Percy Liang PLIANG@CS.STANFORD.EDU Stanford University, 353 Serra Street, Stanford, CA 94305 USA
Pseudocode No The paper describes the RCMS procedure in text but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code, data, and the experiments for this paper are available on Coda Lab at https://www.codalab.org/worksheets/ 0x8967960a7c644492974871ee60198401/. Finally, to showcase the ease of implementation of our method, we provide implementation details and runtime comparisons in the supplementary material, as well as runnable source code in our Coda Lab worksheet.
Open Datasets Yes Handwriting recognition. The first task is the handwriting recognition task from Kassel (1995); we use the clean version of the dataset from Weiss & Taskar (2010). Speech recognition (decoding). Our second task is from the Switchboard speech transcription project (Greenberg et al., 1996). Decipherment. We created a dataset from the English Gigaword corpus (Graff & Cieri, 2003).
Dataset Splits No The paper mentions splits for training and testing, but it does not explicitly state the use of a separate validation set for any of the tasks.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions the use of Ada Grad as an optimization algorithm but does not specify versions for any other software libraries, frameworks, or programming languages used.
Experiment Setup Yes To train the models, we maximized the approximate log-likelihood using Ada Grad (Duchi et al., 2010) with a step size η = 0.2 and δ = 10 4. For each method, we set the beam size to 20. For forced decoding, we used a bigram model with exact inference to impute z. To test RCMS, we trained it in the same way using 20 contexts per position. We used the given plain text to learn the transition probabilities, using absolute discounting (Ney et al., 1994) for smoothing. Then, we used EM to learn the emission probabilities; we used Laplace smoothing for these updates.