A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

Authors: Ming-Liang Zhang, Fei yin, Cheng-Lin Liu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers.
Researcher Affiliation Academia 1MAIS, Institute of Automation of Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences
Pseudocode No The paper includes architectural diagrams (Figure 1, Figure 3) and descriptions of methods, but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code, dataset and appendix material are available at https: //github.com/mingliangzhang2018/PGPS.
Open Datasets Yes Our code, dataset and appendix material are available at https: //github.com/mingliangzhang2018/PGPS. we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. PGPS9K is the largest and the most complete annotation dataset for GPS up to now. http://www.nlpr.ia.ac.cn/databases/CASIA-PGPS9K
Dataset Splits No The paper specifies training and test splits for the datasets (e.g., 'training set 8,433' and 'test set (589)', or 'training set 8,022 and test set 1,000'), but it does not explicitly mention or specify a separate validation dataset split with sizes or percentages.
Hardware Specification Yes Our model is implemented using Pytorch on one GTX-RTX GPU.
Software Dependencies No The paper states 'Our model is implemented using Pytorch' but does not specify the version number of Pytorch or any other software dependencies with specific version numbers.
Experiment Setup Yes The CNN model adopts the Res Net10 [He et al., 2016], feeding with diagram images resized as 128 128. The language model select a transformer encoder [Vaswani et al., 2017], having 6 layers, 8 attention heads, and a hidden embedding size of 1024. The GRU encoder is a two-layer bidirectional GRU [Cho et al., 2014] with input embedding size 256 and hidden state size 512. The selflimited decoder is a two-layer GRU setting same input embedding size and hidden state size of 512. The random probability of data augmentation is set as 0.7 in pre-training and 0.5 in training. We choose the Adam W optimizer [Loshchilov and Hutter, 2017] with weight decay 1e 2 and step decline schedule with decaying rate 0.5. During pre-training, the learning rate of language model is initialized as 5e 4 decaying at 1K, 2K and 3K epochs with a total 4k epochs. During training, all modules of PGPSNet train together with initial learning rate as 5e 5 for language model and 1e 3 for other modules, decaying at 160, 280, 360, 440 and 500 uniformly with a total 540 epochs. In addition, the batch size and dropout rate are set as 128 and 0.2 in all processes.