Spatial as Deep: Spatial CNN for Traffic Scene Understanding
Authors: Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply SCNN on a newly released very challenging traffic lane detection dataset and Cityscapse dataset. The results show that SCNN could learn the spatial relationship for structure output and significantly improves the performance. We show that SCNN outperforms the recurrent neural network (RNN) based Re Net and MRF+CNN (MRFNet) in the lane detection dataset by 8.7% and 4.6% respectively. |
| Researcher Affiliation | Collaboration | 1The Chinese University of Hong Kong 2Sense Time Group Limited |
| Pseudocode | No | The paper contains mathematical equations and diagrams, but no structured pseudocode or algorithm blocks are explicitly labeled or formatted as such. |
| Open Source Code | Yes | Code is available at https://github.com/Xingang Pan/SCNN |
| Open Datasets | Yes | In this paper, we present a large scale challenging dataset for traffic lane detection. To collect data, we mounted cameras on six different vehicles driven by different drivers and recorded videos during driving in Beijing on different days. More than 55 hours of videos were collected and 133,235 frames were extracted, which is more than 20 times of Tu Simple Dataset. We have divided the dataset into 88880 for training set, 9675 for validation set, and 34680 for test set. It also mentions: 'the recently released Tu Simple Benchmark Dataset (Tu Simple 2017) consists of 1224 and 6408 images with annotated lane markings respectively'. |
| Dataset Splits | Yes | We have divided the dataset into 88880 for training set, 9675 for validation set, and 34680 for test set. |
| Hardware Specification | No | Table 6 mentions 'Device CPU2 GPU3 GPU CPU' but does not specify exact models (e.g., NVIDIA A100, Intel Xeon E5) or detailed specifications (e.g., amount of RAM, clock speed) for the hardware used in experiments, which is required for reproducibility. |
| Software Dependencies | Yes | All experiments are implemented on the Torch7 (Collobert, Kavukcuoglu, and Farabet 2011) framework. |
| Experiment Setup | Yes | In both tasks, we train the models using standard SGD with batch size 12, base learning rate 0.01, momentum 0.9, and weight decay 0.0001. The learning rate policy is poly with power and iteration number set to 0.9 and 60K respectively. The initial weights of the first 13 convolution layers are copied from VGG16 (Simonyan and Zisserman 2015) trained on Image Net (Deng et al. 2009). The output channel number of the fc7 layer is set to 128, the rate for the atrous convolution layer of fc6 is set to 4, batch normalization (Ioffe and Szegedy 2015) is added before each Re LU layer. During training, the line width of the targets is set to 16 pixels, and the input and target images are rescaled to 800 288. The loss of background is multiplied by 0.4. |