GPTR: Gestalt-Perception Transformer for Diagram Object Detection
Authors: Xin Hu, Lingling Zhang, Jun Liu, Jinfu Fan, Yang You, Yaqiang Wu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection. We conduct experiments on a diagram dataset AI2D* and a benchmark MSCOCO of natural images to verify the effectiveness of GPTR. |
| Researcher Affiliation | Collaboration | 1 Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, School of Computer Science and Technology, Xi an Jiaotong University, China 2 National Engineering Lab for Big Data Analytics, Xi an Jiaotong University, China 3 Department of Control Science and Engineering, Tongji University, Shanghai, China 4 Department of Computer Science, National University of Singapore, Singapore 5 Lenovo Research, Beijing, China |
| Pseudocode | No | The paper describes the model architecture and components in text and diagrams (e.g., Figure 3), but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | AI2D* is composed of diagrams in the original AI2D dataset (Kembhavi et al. 2016), ... MSCOCO (Lin et al. 2014) is a large-scale object detection dataset... |
| Dataset Splits | No | The paper specifies training and testing sets: 'AI2D* dataset... is divided into a train set with 1,634 diagrams and a test set with 404 diagrams.' and 'MSCOCO... comprises 118,287 images for training and 5,000 images for testing.' However, it does not explicitly mention a separate validation split. |
| Hardware Specification | Yes | All the models are trained and evaluated on NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam W optimizer' but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, CUDA, or other libraries. |
| Experiment Setup | Yes | The learning rate is initially set to 10^-4 and the Adam W optimizer is used in GPTR. The weight decay is set to be 10^-4 and the dropout rate in transformer is 0.1. ... We resize all images of two datasets to 224x224x3 and each image is divided into 196 patches. The dimension of per patch feature is dc = dp = de =256. We set 50 and 100 object queries for AI2D* and MSCOCO datasets, respectively. |