GPTR: Gestalt-Perception Transformer for Diagram Object Detection

Authors: Xin Hu, Lingling Zhang, Jun Liu, Jinfu Fan, Yang You, Yaqiang Wu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection. We conduct experiments on a diagram dataset AI2D* and a benchmark MSCOCO of natural images to verify the effectiveness of GPTR.
Researcher Affiliation Collaboration 1 Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, School of Computer Science and Technology, Xi an Jiaotong University, China 2 National Engineering Lab for Big Data Analytics, Xi an Jiaotong University, China 3 Department of Control Science and Engineering, Tongji University, Shanghai, China 4 Department of Computer Science, National University of Singapore, Singapore 5 Lenovo Research, Beijing, China
Pseudocode No The paper describes the model architecture and components in text and diagrams (e.g., Figure 3), but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes AI2D* is composed of diagrams in the original AI2D dataset (Kembhavi et al. 2016), ... MSCOCO (Lin et al. 2014) is a large-scale object detection dataset...
Dataset Splits No The paper specifies training and testing sets: 'AI2D* dataset... is divided into a train set with 1,634 diagrams and a test set with 404 diagrams.' and 'MSCOCO... comprises 118,287 images for training and 5,000 images for testing.' However, it does not explicitly mention a separate validation split.
Hardware Specification Yes All the models are trained and evaluated on NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions using the 'Adam W optimizer' but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, CUDA, or other libraries.
Experiment Setup Yes The learning rate is initially set to 10^-4 and the Adam W optimizer is used in GPTR. The weight decay is set to be 10^-4 and the dropout rate in transformer is 0.1. ... We resize all images of two datasets to 224x224x3 and each image is divided into 196 patches. The dimension of per patch feature is dc = dp = de =256. We set 50 and 100 object queries for AI2D* and MSCOCO datasets, respectively.