Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Learning to Complete Anything in Lidar
Authors: Ayça Takmaz, Cristiano Saltori, Neehar Peri, Tim Meinhardt, Riccardo De Lutio, Laura Leal-Taixé, Aljosa Osep
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our method is versatile and can be used for semantic (Behley et al., 2019; Li et al., 2024), and panoptic scene completion (Cao et al., 2024) (Fig. 1, 1 ) via test-time prompts. Moreover, we qualitatively show that our approach can localize objects as 3D bounding boxes (Fig. 1, 2 ) and demonstrate that our method can recognize and complete arbitrary objects not captured in canonical semantic vocabularies (Fig. 1, 3 ). We evaluate CAL s ability to localize and complete the full 3D extent of instances from a Lidar point cloud given a class vocabulary prompt at test time. Key details on the experimental setup (Sec. 4.1), benchmark comparisons (Sec. 4.2), and ablations on design choices (Sec. 4.34.3) are discussed below, with further implementation details in the Appendix. |
| Researcher Affiliation | Collaboration | Ayça Takmaz 1 2 Cristiano Saltori 1 Neehar Peri 1 3 Tim Meinhardt 1 Riccardo de Lutio 1 Laura Leal-Taixé 1 Aljosa Osep 1 Work done during a research internship at NVIDIA. 1NVIDIA 2ETH Zurich 3Carnegie Mellon University (CMU). Correspondence to: Ayça Takmaz <EMAIL>. |
| Pseudocode | No | The paper describes the methodology in narrative text and refers to figures, but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing their own code, nor does it provide a specific repository link for the methodology described. |
| Open Datasets | Yes | We follow prior work (Cao et al., 2024) and evaluate CAL on two datasets that provide semantic and instance-level labels for PSC: SSCBench KITTI360 (Li et al., 2024; Liao et al., 2021) and Semantic KITTI (Behley et al., 2019; Geiger et al., 2012; 2013) whose instance-level labels are provided by Cao et al. (2024). |
| Dataset Splits | Yes | More specifically, for Semantic KITTI, we use one scan in every 5 scans, resulting in a total of 4649 pseudo-label samples. For KITTI-360, we use the Lidar scan IDs used in SSCBench-KITTI360, resulting in 8487 training scans, 1780 validation scans, and 2165 test scans as pseudo-labels. |
| Hardware Specification | Yes | We train the model for 50 epochs on 8 NVIDIA A100 GPUs, using a batch size of 8 with 1 item per GPU, and a learning rate of 0.0001. |
| Software Dependencies | No | The paper mentions software components like 'pydensecrf', 'Cylinder3D', and 'Mask2Former' but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | We train the model for 50 epochs on 8 NVIDIA A100 GPUs, using a batch size of 8 with 1 item per GPU, and a learning rate of 0.0001. As also described earlier in A.4, we follow (Behley et al., 2019) and define a voxel grid volume extending 51.2 m forward, 25.6 m to the sides, and 6.4 m in height, with a voxel size of 0.2 m. We adopt this setting for both model training and inference. Total loss is formulated as Ltotal = λocc Locc + λprot Lprot + λmask Lmask + λCLIP LCLIP + Laux, (1) where each λ is a scalar weight, λcompl = 1.0, λmask = 40.0, λCLIP = 1.0, λprot = 1.0. |