Image-to-Markup Generation with Coarse-to-Fine Attention

Authors: Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is evaluated in the context of image-to-La Te X generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with La Te X markup. We show that unlike neural OCR techniques using CTCbased models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data.
Researcher Affiliation Academia 1Harvard University 2University of Eastern Finland. Correspondence to: Yuntian Deng <dengyuntian@seas.harvard.edu>.
Pseudocode No The paper describes algorithms in text and through network diagrams but does not include explicit pseudocode blocks or algorithm listings.
Open Source Code Yes All data, models, and evaluation scripts are publicly available at http://lstm.seas.harvard.edu/latex/.
Open Datasets Yes To make these experiments possible, we also construct a new public dataset, IM2LATEX-100K, which consists of a large collection of rendered real-world mathematical expressions collected from published articles.
Dataset Splits Yes The dataset is separated into training set (83,883 equations), validation set (9,319 equations) and test set (10,354 equations) for a standardized experimental setup.
Hardware Specification Yes Experiments are run on a 12GB Nvidia Titan X GPU (Maxwell).
Software Dependencies No The paper mentions software used: "The system is built using Torch (Collobert et al., 2011) based on the Open NMT system (Klein et al., 2017)." However, specific version numbers for Torch or Open NMT are not provided.
Experiment Setup Yes For the standard attention models, we use batch size of 20. The initial learning rate is set to 0.1, and we halve it once the validation perplexity does not decrease. We train the model for 12 epochs and use the validation perplexity to choose the best model. For the hierarchical and coarse-to-fine attention models, we use batch size of 6. For hard attention, we use the pretrained weights of hierarchical to initialize the parameters. Then we use initial learning rate 0.005, average reward baseline learning rate β = 0.01, reward discount rate γ = 0.5.