POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Authors: Antonin Vobecky, Oriane Siméoni, David Hurych, Spyridon Gidaris, Andrei Bursuc, Patrick Pérez, Josef Sivic
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate quantitatively the strengths of the proposed model on several open-vocabulary tasks: Zero-shot 3D semantic segmentation using existing datasets; 3D grounding and retrieval of free-form language queries, using a small dataset that we propose as an extension of nu Scenes. |
| Researcher Affiliation | Collaboration | Antonin Vobecky1,2,3 Oriane Siméoni1 David Hurych1 Spyros Gidaris1 Andrei Bursuc1 Patrick Pérez1 Josef Sivic2 1 valeo.ai, Paris, France 2 CIIRC CTU in Prague 3 FEE CTU in Prague |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | You can find the project page here https://vobecant.github.io/POP3D. |
| Open Datasets | Yes | We use the nu Scenes [10] dataset composed of 1000 sequences in total, divided into 700/150/150 scenes for train/val/test splits. |
| Dataset Splits | Yes | We use the nu Scenes [10] dataset composed of 1000 sequences in total, divided into 700/150/150 scenes for train/val/test splits. Each sequence consists of 30 40 scenes resulting in 28, 130 training and in 6, 019 validation scenes. |
| Hardware Specification | Yes | We train our models on 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer", "Res Net-101", "Mask CLIP+", and "TPVFormer" but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | If not mentioned otherwise, we use the default learning rate of 2e-4, Adam [30] optimizer, and a cosine learning rate scheduler with final learning rate 1e-6, and with linear warmup from 1e-5 learning rate for the first 500 iterations. Both prediction heads have two layers, i.e., Nocc = Nft = 2, and Cocc = 512 and Cft = 1024 feature channels. We put the same weight to the occupancy and feature losses, i.e., we set λ = 1 in Eq. 8. |