Graph-to-Graph: Towards Accurate and Interpretable Online Handwritten Mathematical Expression Recognition

Authors: Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang, Cheng-Lin Liu2925-2933

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on CROHME datasets to demonstrate the benefits of the proposed G2G model. Our method yields significant improvements over previous SOTA image-to-markup systems.
Researcher Affiliation Academia Jin-Wen Wu1,2 , Fei Yin1, Yan-Ming Zhang1, Xu-Yao Zhang1,2, Cheng-Lin Liu1,2,3 1 National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences 2 School of Artificial Intelligence, University of Chinese Academy of Sciences 3 CAS Center for Excellence of Brain Science and Intelligence Technology {jinwen.wu, fyin, ymzhang, xyz, liucl}@nlpr.ia.ac.cn
Pseudocode No The paper describes methods and models but does not contain a dedicated pseudocode block or algorithm listing.
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the methodology described.
Open Datasets Yes We evaluate our model on the large public dataset available from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) (Mouchère et al. 2016).
Dataset Splits Yes The CROHME training set contains 8,835 formulas with both symbol-level and expression-level annotations, and the test sets for CROHME 2013/2014/2016 contain 671/986/1,147 formulas, respectively. Consistent with participating systems in CROHME, we use the test set of CROHME 2013 as a validation set in training stage, and use the test sets of CROHME 2014 and 2016 to evaluate our proposed model.
Hardware Specification Yes Our models were implemented in Py Torch and optimized on two 12GB Nvidia TITAN X GPUs.
Software Dependencies No The paper mentions implementation in PyTorch but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The coefficients of different supervision losses are set experimentally. Specifically, we set λ1 = λ2 = λ6 = 0.5 to impose the same supervision on learning the representations of the nodes, edges and sub-graphs in the input graph. The supervision loss coefficients for the generation of nodes and edges in target graph are set to λ2 = λ3 = 1. We set λ5 = 0.3 to guide the distribution of attention coefficients on the source sub-graphs. The proposed model are optimized via the adaptive moment estimation (Adam, Kingma and Ba 2015) with learning rate 5e-4. Both the decoder and the encoder stack 3 GNN blocks. The network for pre-extracting the input primitive features has 4 blocks. We use 256 for the embedding dimension C of decoder GNN, and 400 for the dimension C of sub-graph attention.