reproducibilityindex.ai

Recognizing Vector Graphics without Rasterization

Authors: XINYANG JIANG, LU LIU, Caihua Shan, Yifei Shen, Xuanyi Dong, Dongsheng Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that by directly operating on vector graphics, YOLa T outperforms raster-graphic based object detection baselines in terms of both average precision and efﬁciency. Code is available at https://github.com/microsoft/YOLa TVector Graphics Recognition. To evaluate our pipeline over vector graphics, we use two datasets. i.e., ﬂoorplans and diagrams and show the advantages of our method over the raster graphics based object detection baselines. We compare YOLa T with two types of object detection methods: one-stage methods, i.e., Yolov3 [14], Yolov4 [15, 40] and its variants, Retina Net [6], and two-stage methods, i.e., faster-rcnn with Pyramid Network (FPN) [41] and its variants. Table 1: Performance comparison on the ﬂoorplan dataset.
Researcher Affiliation	Collaboration	Xinyang Jiang1, Lu Liu2 , Caihua Shan1, Yifei Shen3 , Xuanyi Dong2 , Dongsheng Li1 1Microsoft Research Asia {xinyangjiang,caihua.shan,dongsheng.li}@microsoft.com 2University of Technology Sydney u.liu.cs@icloud.com,xuanyi.dxy@gmail.com 3The Hong Kong University of Science and Technology yshenaw@connect.ust.hk
Pseudocode	No	The paper describes the model architecture and steps (e.g., 'Graph Construction', 'Feature Extraction with Dual-stream GNN') but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is available at https://github.com/microsoft/YOLa TVector Graphics Recognition.
Open Datasets	Yes	We use SESYD, which is a public database containing different types of vector graphic documents, with the corresponding object detection groundtruth, produced using the 3g T system1. 1http://mathieu.delalandre.free.fr/projects/sesyd/
Dataset Splits	Yes	Floorplans. ... We divide half of the layouts as the training data and the other half for validation and test. The ratio of the validation and test data is 1:9. Diagrams. ... the dataset is split as 600, 41 and 359 images for training, validation and test stage.
Hardware Specification	Yes	The model is trained for 200 epochs from scratch which takes around 2 hours on a Nvidia V100 graphic card. The inference time is evaluated on a Nvidia V100.
Software Dependencies	No	The paper mentions specific implementations for baselines, such as 'ultralytics2 [43]' and 'Detectron2 [44]', which include version information in their citations. However, it does not provide specific version numbers for the general software components or libraries used for their own method, YOLa T (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We use Adam optimizer with a learning rate of 0.0025 and a batch size of 16. For data augmentation, we randomly translate and scale the vector graphics by at most 10% of the image width and height, and the transformed vector graphics are further rotated by a random angle. The model is trained for 200 epochs from scratch which takes around 2 hours on a Nvidia V100 graphic card.