Group-Pair Convolutional Neural Networks for Multi-View Based 3D Object Retrieval
Authors: Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three public benchmarks show that our GPCNN solution significantly outperforms the state-of-the-art methods with 3% to 42% improvement in retrieval accuracy. |
| Researcher Affiliation | Academia | Zan Gao,1,2, Deyu Wang,1,2 Xiangnan He,3 Hua Zhang1,2 1 Key Laboratory of Computer Vision and System, Tianjin University of Technology Ministry of Education,Tianjin, 300384, China 2Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, 300384, China 3School of Computing, National University of Singapore, 117417, Singapore |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | In our experiments, three widely-used datasets are employed where each object in gallery set is firstly represented by a free set of views which means that these views can be captured from any direction without camera constraint. The details of these datasets are shown as follows: ETH 3D object dataset (Ess et al. 2008), where it contains 80 objects belonging to 8 categories, and each object from ETH includes 41 different view images; NTU-60 3D model dataset (Chen et al. 2003), where it contains 549 objects belonging to 47 categories, and each object from NTU-60 includes 60 different view samples; MVRED 3D category dataset (Liu et al. 2016), where it contains 505 objects belonging to 61 categories, and each object from MVRED includes 36 different view images; |
| Dataset Splits | Yes | For each dataset, the first 80% views in each object is utilzied as gallery set and the remaining views in each object is used as query set. When building the training dataset and validation dataset, we choose group-pair samples from the gallery set. The proportion of positive and negative grouppair sample is 1:3 for all datasets. In ETH, 10,000 positive group-pair samples and 30,000 negative group-pair samples from ETH gallery set are produced as training samples. In NTU and MVRED, 30,000 positive group-pair samples and 90,000 negative group-pair samples from NTU gallery set are collected as training samples. For the validation dataset, the same scheme is employed. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions VGG-16 and Siamese CNN, but does not provide specific software dependency versions (e.g., library names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In our experiments,for GPCNN, when it is utilzied into NTU and MVRED datasets, the sizes of convolution kernels of each layer in CNN Part I are 32, 64, 128,256 and 512 respectively, but when it is evaluated on ETH dataset, the sizes of convolution kernels of each layer in CNN Part I are 16, 32, 64, 128 and 256 respectively. As for VGG-16 and siamese convolutional neural network, the parameters are pre-trained on the Image Net dataset, and then each 3D dataset is utilized to fine-tune the parameters with default settings. The proportion of positive and negative grouppair sample is 1:3 for all datasets. In ETH, 10,000 positive group-pair samples and 30,000 negative group-pair samples from ETH gallery set are produced as training samples. In NTU and MVRED, 30,000 positive group-pair samples and 90,000 negative group-pair samples from NTU gallery set are collected as training samples. |