reproducibilityindex.ai

Image-to-Markup Generation with Coarse-to-Fine Attention

Authors: Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method is evaluated in the context of image-to-La Te X generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with La Te X markup. We show that unlike neural OCR techniques using CTCbased models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data.
Researcher Affiliation	Academia	1Harvard University 2University of Eastern Finland. Correspondence to: Yuntian Deng <dengyuntian@seas.harvard.edu>.
Pseudocode	No	The paper describes algorithms in text and through network diagrams but does not include explicit pseudocode blocks or algorithm listings.
Open Source Code	Yes	All data, models, and evaluation scripts are publicly available at http://lstm.seas.harvard.edu/latex/.
Open Datasets	Yes	To make these experiments possible, we also construct a new public dataset, IM2LATEX-100K, which consists of a large collection of rendered real-world mathematical expressions collected from published articles.
Dataset Splits	Yes	The dataset is separated into training set (83,883 equations), validation set (9,319 equations) and test set (10,354 equations) for a standardized experimental setup.
Hardware Specification	Yes	Experiments are run on a 12GB Nvidia Titan X GPU (Maxwell).
Software Dependencies	No	The paper mentions software used: "The system is built using Torch (Collobert et al., 2011) based on the Open NMT system (Klein et al., 2017)." However, specific version numbers for Torch or Open NMT are not provided.
Experiment Setup	Yes	For the standard attention models, we use batch size of 20. The initial learning rate is set to 0.1, and we halve it once the validation perplexity does not decrease. We train the model for 12 epochs and use the validation perplexity to choose the best model. For the hierarchical and coarse-to-fine attention models, we use batch size of 6. For hard attention, we use the pretrained weights of hierarchical to initialize the parameters. Then we use initial learning rate 0.005, average reward baseline learning rate β = 0.01, reward discount rate γ = 0.5.