AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
Authors: Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult, Francis Engelmann, Bastian Leibe, Konrad Schindler, Theodora Kontogianni
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments with four different 3D point cloud datasets, AGILE3D sets a new state-of-the-art. Moreover, we also verify its practicality in real-world setups with real user studies. |
| Researcher Affiliation | Collaboration | Yuanwen Yue1,2 Sabarinath Mahadevan3 Jonas Schult3 Francis Engelmann2,4 Bastian Leibe3 Konrad Schindler1,2 Theodora Kontogianni2 1Photogrammetry and Remote Sensing, ETH Zurich 2ETH AI Center, ETH Zurich 3Computer Vision Group, RWTH Aachen University 4Google |
| Pseudocode | No | The paper provides detailed architectural descriptions and flowcharts, but it does not include any formal pseudocode blocks or algorithms with numbered steps or code-like formatting. |
| Open Source Code | No | We will release the source code of AGILE3D as well as the annotation tool to facilitate future research. |
| Open Datasets | Yes | We train on a single dataset, Scan Net V2-Train (Dai et al., 2017), and then evaluate on Scan Net V2-Val (Inc. Scan Net20 and Scan Net40) (Dai et al., 2017), S3DIS (Armeni et al., 2016), KITTI-360 (Liao et al., 2022). |
| Dataset Splits | Yes | Scan Net V2 (Dai et al., 2017) is a richly-annotated dataset of 3D indoor scenes, covering diverse room types such as offices, hotels, and living rooms. It contains 1202 scenes for training and 312 scenes for validation as well as 100 hidden test scenes. |
| Hardware Specification | Yes | Time is measured on a single TITAN RTX GPU. We use a single TITAN RTX GPU with 24GB memory for training. |
| Software Dependencies | No | The paper mentions software components like 'Minkowski Res16UNet34C', 'Adam W optimizer', and 'Open3D', but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We set the λCE = 1 and the λDice = 2 in the loss function. The loss is applied to every intermediate layer of the click attention module. We use the Adam W optimizer (Loshchilov & Hutter, 2019) with a weight decay factor 1e-4. We train the model on Scan Net40 for 1100 epochs with an initial learning rate 1e-4, which is decayed by 0.1 after 1000 epochs. Due to the smaller data size, we train the model on Scan Net20 for 850 epochs with an initial learning rate 1e-4, which is decayed by 0.1 after 800 epochs. |