POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images

Authors: Antonin Vobecky, Oriane Siméoni, David Hurych, Spyridon Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate quantitatively the strengths of the proposed model on several open-vocabulary tasks: Zero-shot 3D semantic segmentation using existing datasets; 3D grounding and retrieval of free-form language queries, using a small dataset that we propose as an extension of nu Scenes.
Researcher Affiliation Collaboration Antonin Vobecky1,2,3 Oriane Siméoni1 David Hurych1 Spyros Gidaris1 Andrei Bursuc1 Patrick Pérez1 Josef Sivic2 1 valeo.ai, Paris, France 2 CIIRC CTU in Prague 3 FEE CTU in Prague
Pseudocode No The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes You can find the project page here https://vobecant.github.io/POP3D.
Open Datasets Yes We use the nu Scenes [10] dataset composed of 1000 sequences in total, divided into 700/150/150 scenes for train/val/test splits.
Dataset Splits Yes We use the nu Scenes [10] dataset composed of 1000 sequences in total, divided into 700/150/150 scenes for train/val/test splits. Each sequence consists of 30 40 scenes resulting in 28, 130 training and in 6, 019 validation scenes.
Hardware Specification Yes We train our models on 8 A100 GPUs.
Software Dependencies No The paper mentions software components like "Adam optimizer", "Res Net-101", "Mask CLIP+", and "TPVFormer" but does not provide specific version numbers for any of them.
Experiment Setup Yes If not mentioned otherwise, we use the default learning rate of 2e-4, Adam [30] optimizer, and a cosine learning rate scheduler with final learning rate 1e-6, and with linear warmup from 1e-5 learning rate for the first 500 iterations. Both prediction heads have two layers, i.e., Nocc = Nft = 2, and Cocc = 512 and Cft = 1024 feature channels. We put the same weight to the occupancy and feature losses, i.e., we set λ = 1 in Eq. 8.