Handwritten Mathematical Expression Recognition via Attention Aggregation Based Bi-directional Mutual Learning

Authors: Xiaohang Bian, Bo Qin, Xiaozhe Xin, Jianwu Li, Xuefeng Su, Yanfeng Wang113-121

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods.
Researcher Affiliation Collaboration 1 School of Computer Science and Technology, Beijing Institute of Technology, China 2 AI Interaction Department, Tencent, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is available in https://github.com/XH-B/ABM.
Open Datasets Yes We train our models based on the CROHME 2014 competition dataset with 111 classes of mathematical symbols and 8836 handwritten mathematical expressions
Dataset Splits No The paper mentions using a validation process for early stopping ("training will stop early when the learning rate drops 10 times") but does not provide specific details about the validation dataset split (e.g., percentages, sample counts, or how it was derived from the training data).
Hardware Specification Yes All the models are trained/tested on a single NVIDIA V100 16GB GPU.
Software Dependencies No The paper mentions using the Adadelta optimizer but does not specify version numbers for any software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or libraries.
Experiment Setup Yes Our proposed method is optimized with Adadelta optimizer, and its learning rate starts from 1, decaying two times smaller when the WER does not decrease within 15 epochs. And the training will stop early when the learning rate drops 10 times. We set the batch size as 16. For the decoder, we set n = 256, d=512, D=684 and K=113 (adding sos and eos on 111 labels). In the loss function, λ is set to 0.5.