DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding

Authors: Xiaoxuan Yu, Hao Wang, Weiming Li, Qiang Wang, Soonyong Cho, Younghun Sung

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Qualitative and quantitative experimental results demonstrate that our method achieves state-of-the-art performance on the challenging Scan Net dataset.
Researcher Affiliation Industry Xiaoxuan Yu1*, Hao Wang1, Weiming Li1, Qiang Wang1, Soonyong Cho2, Younghun Sung2 1Samsung Research China Beijing 2Samsung Advanced Institute of Technology xiaoxuan1.yu@samsung.com, hao1.wang@samsung.com, weiming.li@samsung.com, qiang.w@samsung.com, soonyong.cho@samsung.com, younghun.sung@samsung.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at https://github.com/SAITPublic/DOCTR.
Open Datasets Yes We follow exactly the same data split and pre-processing method as DIMR.
Dataset Splits Yes We follow exactly the same data split and pre-processing method as DIMR.
Hardware Specification Yes on a single Nvidia RTX A6000 GPU
Software Dependencies No The paper mentions 'Adam W ... optimizer' and 'Minkowski Res16UNet34C', but does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch), or other libraries (e.g., CUDA) required for reproducibility.
Experiment Setup Yes During training, we use the Adam W (Loshchilov and Hutter 2017) optimizer for 600 epochs with a batch size of 5 on a single Nvidia RTX A6000 GPU for all the experiments. One-cycle learning rate schedule (Smith and Topin 2019) is utilized with a maximum learning rate of 10 4 and a minimum learning rate of 10 6. Standard data augmentation are performed on point cloud including horizontal flipping, random rotations around the z-axis, elastic distortion, and random scaling.