Recognizing Vector Graphics without Rasterization

Authors: XINYANG JIANG, LU LIU, Caihua Shan, Yifei Shen, Xuanyi Dong, Dongsheng Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that by directly operating on vector graphics, YOLa T outperforms raster-graphic based object detection baselines in terms of both average precision and efficiency. Code is available at https://github.com/microsoft/YOLa TVector Graphics Recognition. To evaluate our pipeline over vector graphics, we use two datasets. i.e., floorplans and diagrams and show the advantages of our method over the raster graphics based object detection baselines. We compare YOLa T with two types of object detection methods: one-stage methods, i.e., Yolov3 [14], Yolov4 [15, 40] and its variants, Retina Net [6], and two-stage methods, i.e., faster-rcnn with Pyramid Network (FPN) [41] and its variants. Table 1: Performance comparison on the floorplan dataset.
Researcher Affiliation Collaboration Xinyang Jiang1, Lu Liu2 , Caihua Shan1, Yifei Shen3 , Xuanyi Dong2 , Dongsheng Li1 1Microsoft Research Asia {xinyangjiang,caihua.shan,dongsheng.li}@microsoft.com 2University of Technology Sydney u.liu.cs@icloud.com,xuanyi.dxy@gmail.com 3The Hong Kong University of Science and Technology yshenaw@connect.ust.hk
Pseudocode No The paper describes the model architecture and steps (e.g., 'Graph Construction', 'Feature Extraction with Dual-stream GNN') but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is available at https://github.com/microsoft/YOLa TVector Graphics Recognition.
Open Datasets Yes We use SESYD, which is a public database containing different types of vector graphic documents, with the corresponding object detection groundtruth, produced using the 3g T system1. 1http://mathieu.delalandre.free.fr/projects/sesyd/
Dataset Splits Yes Floorplans. ... We divide half of the layouts as the training data and the other half for validation and test. The ratio of the validation and test data is 1:9. Diagrams. ... the dataset is split as 600, 41 and 359 images for training, validation and test stage.
Hardware Specification Yes The model is trained for 200 epochs from scratch which takes around 2 hours on a Nvidia V100 graphic card. The inference time is evaluated on a Nvidia V100.
Software Dependencies No The paper mentions specific implementations for baselines, such as 'ultralytics2 [43]' and 'Detectron2 [44]', which include version information in their citations. However, it does not provide specific version numbers for the general software components or libraries used for their own method, YOLa T (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We use Adam optimizer with a learning rate of 0.0025 and a batch size of 16. For data augmentation, we randomly translate and scale the vector graphics by at most 10% of the image width and height, and the transformed vector graphics are further rotated by a random angle. The model is trained for 200 epochs from scratch which takes around 2 hours on a Nvidia V100 graphic card.