Plane Geometry Diagram Parsing

Authors: Ming-Liang Zhang, Fei Yin, Yi-Han Hao, Cheng-Lin Liu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on PGDP5K and an existing dataset IMP-Geometry3K show that our model outperforms state-of-the-art methods in four sub-tasks remarkably.
Researcher Affiliation Academia 1National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3School of Electronic Information Engineering, Beijing Jiaotong University
Pseudocode No The paper describes the model architecture and components (e.g., FPN, FCOS, GSM, GNN) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code, dataset and appendix material are available at https://github.com/ mingliangzhang2018/PGDP.
Open Datasets Yes Also, to facilitate the research of PGDP, we build a new large-scale geometry diagram dataset named PGDP5K, labeled with annotations of primitive locations, classes and their relations. Our code, dataset and appendix material are available at https://github.com/ mingliangzhang2018/PGDP. 1http://www.nlpr.ia.ac.cn/databases/CASIA-PGDP5K
Dataset Splits Yes We randomly split the dataset into three subsets: train set (3,500), validation set (500) and test set (1,000).
Hardware Specification Yes We train our model in 40K iterations with batch size of 12 on 4 TITAN-Xp GPUs.
Software Dependencies No The paper mentions using 'Py Torch and FCOS framework' but does not specify their version numbers or any other software dependencies with specific versions.
Experiment Setup Yes We choose the Adam optimizer with an initial learning rate 5e 4, weight decay 1e 4, step decline schedule decaying with a rate of 0.2 at 20K, 30K and 35K iterations. We train our model in 40K iterations with batch size of 12 on 4 TITAN-Xp GPUs. The NDM, GSM and VLEM all use 3 groups of 128-channel convolution layers with corresponding Batch Norm layers. The segmentation embedding dimensionality is 8 and the visual-location embedding dimensionality is 64. The layer number of GM is 5 and the feature dimensionalities of nodes and edges are all set to 64.