Fine-Grained Car Detection for Visual Census Estimation

Authors: Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Li Fei-Fei

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. In Tab. 1 we present aggregate statistics of our fine-grained car dataset, which has a total of 712,430 images and 382,591 bounding boxes. Table 1: Dataset statistics for our training, validation, and test splits. Attribute Accuracy Make 66.38% Model 51.83% Submodel 77.74% Price 61.61% Domestic/Foreign 87.71% Country 84.21% Table 2: Classification accuracy on the test set for various car attributes.
Researcher Affiliation Academia Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Li Fei-Fei Department of Computer Science, Stanford University {tgebru, jkrause, yilunw, duchen, feifeili}@cs.stanford.edu Department of Computer Science, University of Michigan jiadeng@umich.edu
Pseudocode No No, the paper describes its methods in prose without including structured pseudocode or algorithm blocks.
Open Source Code No No, the paper states 'We make our dataset publicly available' but does not provide any statement or link for the open-sourcing of their methodology's code.
Open Datasets Yes We make our dataset publicly available and anticipate its use by computer vision researchers focused on fine-grained recognition.
Dataset Splits Yes Table 1: Dataset statistics for our training, validation, and test splits. Attribute Training Validation Test Street View Images 199,666 39,933 159,732 Product Shot Images 313,099 Total Images 512,765 39,933 159,732 Street View BBoxes 34,712 6,915 27,865 Product Shot BBoxes 313,099 Total BBoxes 347,811 6,915 27,865
Hardware Specification Yes With this architecture, we detected cars on our entire dataset in less than two weeks with 200 2.1 GHz CPU cores.
Software Dependencies No No, the paper mentions high-level approaches and models like 'deformable part models(DPM)' and 'convolutional neural network (CNN) with an architecture following (Krizhevsky, Sutskever, and Hinton 2012)', but does not provide specific software versions for libraries, frameworks, or programming languages used.
Experiment Setup Yes After extensive cross validation, we decided upon a single component DPM with 8 parts, achieving an average precision (AP) of 64.2% at 5 seconds per Street View image. We made three modifications to the traditional CNN training procedure to improve our classifier performance. First, we seek to prevent our classifier from overfitting on product shot images... by duplicating each Street View example 10 times during training. Next, we apply transformations to product shot images to make them similar to those from Street View. ... we dynamically downsize each product shot image according to this distribution and rescale it to fit the input dimensions of the CNN. Finally... we use our validation data to measure the distribution of intersection over union (IOU) overlap... For each Street View bounding box input to the CNN, we randomly sample its source image according to this IOU distribution, simulating noisy detections during training.