VoxDet: Voxel Learning for Novel Instance Detection

Authors: Bowen Li, Jiashun Wang, Yaoyu Hu, Chen Wang, Sebastian Scherer

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Exhaustive experiments are conducted on the demanding Line Mod-Occlusion, YCB-video, and Robo Tools benchmarks, where Vox Det outperforms various 2D baselines remarkably with faster speed.
Researcher Affiliation Academia 1 Carnegie Mellon University 2 State University of New York at Buffalo
Pseudocode No The paper describes the method using text, diagrams, and equations but does not include structured pseudocode or an algorithm block.
Open Source Code Yes Our code, data, raw results, and pre-trained models are public at https://github.com/Jaraxxus-Me/Vox Det.
Open Datasets Yes Synthetic Training set: In response to the scarcity of instance detection traing sets, we ve compiled a comprehensive synthetic dataset using 9,901 objects from Shape Net [15] and ABO [16]. Synthetic-Real Test set: We utilize two authoritative benchmarks for testing. Line Mod Occlusion [17] (LM-O) features 8 texture-less instances and 1,514 box annotations... The YCB-Video [18] (YCB-V) contains 21 instances and 4,125 target boxes...
Dataset Splits No The paper mentions 'training' and 'test' sets for datasets like OWID, LM-O, and YCB-V. For OWID, it states '55,000 scenes with 180,000 boxes for training and an additional 500 images for evaluation,' but it does not specify explicit validation splits or percentages for reproducing the data partitioning.
Hardware Specification Yes The reconstruction stage of Vox Det was trained on a single Nvidia V100 GPU over a period of 6 hours, while the detection training phase utilized four Nvidia V100 GPUs for a span of 40 hours. Inferences were conducted on a single V100 GPU to ensure fair efficiency comparison.
Software Dependencies No The paper mentions using specific functions from a library (e.g., 'torch.nn.functional.affine_grid()'), but does not provide version numbers for software dependencies like PyTorch or Blenderproc.
Experiment Setup Yes In the first reconstruction stage, we set the loss weights as wrecon = 10.0, wgan = 0.01, wpercep = 1.0. The model is trained for 16 epoch on the 9600 instances from OWID datasets. We leveraged Adam optimizer [47] with a base learning rate of 5 10 5 during training. In the second detection stage, the loss weights are set as w1 = w2 = w3 = w4 = w5 = 1.0, w6 = 0 in the first 10 epochs, where SGD is leveraged as an optimizer with 0.02 base learning rate. During testing, we supplied each model with the same set of M = 10 template images per instance, and all methods employed the top N = 500 ranking proposals for matching.