A Tree-Structured Decoder for Image-to-Markup Generation

Authors: Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Lirong Dai

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on both math formula recognition and chemical formula recognition, the proposed tree decoder is shown to greatly outperform strong string decoder baselines.
Researcher Affiliation Collaboration Jianshu Zhang 1 2 Jun Du 1 Yongxin Yang 3 Yi-Zhe Song 3 Si Wei 2 Lirong Dai 1 1NEL-SLIP Lab, University of Science and Technology of China, Hefei, Anhui, China 2i FLYTEK Research, Hefei, Anhui, China 3Sketch X Lab, University of Surrey, Guildford, Surrey, United Kingdom.
Pseudocode No The paper describes its methods using text and equations but does not include a structured pseudocode or algorithm block.
Open Source Code No Source code and the toy datasets will be publicly released to facilitate future research.
Open Datasets Yes For math formula recognition, we evaluate our model on CROHME benchmark (Mouch ere et al., 2016b;a; Mahdavi et al., 2019), which is currently the largest dataset for online handwritten math formula recognition. ... SMILES (Jin et al., 2018) dataset provides a large mount of printed chemical formula images and corresponding SMILES strings.
Dataset Splits Yes We choose 100,000 chemical formulas from SMILES dataset. We use 90,000 formulas as train set, 3,000 formulas as validation set and the other 7,000 as test set.
Hardware Specification Yes Experiments were conducted on a single Nvidia Tesla V100 with 16GB RAM.
Software Dependencies No The paper only mentions "Py Torch" without specifying its version number or any other software dependencies with versions.
Experiment Setup Yes We employ three dense blocks in the main branch. We set the growth rate to k = 24, the depth (number of convolution layers) of each block to D = 32. Both the child and parent decoders adopt 2 unidirectional GRU layers, each layer has 256 forward GRU units. The child attention dimension, parent attention dimension and memory attention dimension are set to 512. The embedding dimensions for both child node and parent node are set to 256. We employ the ADADELTA algorithm (Zeiler, 2012) for optimization, with the following hyper parameters: ρ = 0.95, and ε = 10 6. a beam size of 3 is set for both parent and child beam in our experiments.