Recognizing Vector Graphics without Rasterization
Authors: XINYANG JIANG, LU LIU, Caihua Shan, Yifei Shen, Xuanyi Dong, Dongsheng Li
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that by directly operating on vector graphics, YOLa T outperforms raster-graphic based object detection baselines in terms of both average precision and efficiency. Code is available at https://github.com/microsoft/YOLa TVector Graphics Recognition. To evaluate our pipeline over vector graphics, we use two datasets. i.e., floorplans and diagrams and show the advantages of our method over the raster graphics based object detection baselines. We compare YOLa T with two types of object detection methods: one-stage methods, i.e., Yolov3 [14], Yolov4 [15, 40] and its variants, Retina Net [6], and two-stage methods, i.e., faster-rcnn with Pyramid Network (FPN) [41] and its variants. Table 1: Performance comparison on the floorplan dataset. |
| Researcher Affiliation | Collaboration | Xinyang Jiang1, Lu Liu2 , Caihua Shan1, Yifei Shen3 , Xuanyi Dong2 , Dongsheng Li1 1Microsoft Research Asia {xinyangjiang,caihua.shan,dongsheng.li}@microsoft.com 2University of Technology Sydney u.liu.cs@icloud.com,xuanyi.dxy@gmail.com 3The Hong Kong University of Science and Technology yshenaw@connect.ust.hk |
| Pseudocode | No | The paper describes the model architecture and steps (e.g., 'Graph Construction', 'Feature Extraction with Dual-stream GNN') but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at https://github.com/microsoft/YOLa TVector Graphics Recognition. |
| Open Datasets | Yes | We use SESYD, which is a public database containing different types of vector graphic documents, with the corresponding object detection groundtruth, produced using the 3g T system1. 1http://mathieu.delalandre.free.fr/projects/sesyd/ |
| Dataset Splits | Yes | Floorplans. ... We divide half of the layouts as the training data and the other half for validation and test. The ratio of the validation and test data is 1:9. Diagrams. ... the dataset is split as 600, 41 and 359 images for training, validation and test stage. |
| Hardware Specification | Yes | The model is trained for 200 epochs from scratch which takes around 2 hours on a Nvidia V100 graphic card. The inference time is evaluated on a Nvidia V100. |
| Software Dependencies | No | The paper mentions specific implementations for baselines, such as 'ultralytics2 [43]' and 'Detectron2 [44]', which include version information in their citations. However, it does not provide specific version numbers for the general software components or libraries used for their own method, YOLa T (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We use Adam optimizer with a learning rate of 0.0025 and a batch size of 16. For data augmentation, we randomly translate and scale the vector graphics by at most 10% of the image width and height, and the transformed vector graphics are further rotated by a random angle. The model is trained for 200 epochs from scratch which takes around 2 hours on a Nvidia V100 graphic card. |