Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image

Authors: Yuki Kawana, Tatsuya Harada

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation on both synthetic and real data demonstrates that our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
Researcher Affiliation Academia Yuki Kawana1, Tatsuya Harada1,2 1The University of Tokyo, 2RIKEN AIP {kawana, harada}@mi.t.u-tokyo.ac.jp
Pseudocode Yes Algorithm 1 Kinematic-aware part fusion (KPF) ... Algorithm 2 Part Fusion with k Io U (PF-k Io U)
Open Source Code No The paper mentions using publicly available code for baselines (3DETR, A-SDF, OPD) but does not explicitly state that the code for their proposed method is open-source or provide a link to it.
Open Datasets Yes We evaluate our method on both synthetic and real-world data. We use the SAPIEN [58] dataset for synthetic data evaluation, following recent works on articulated shape reconstruction [24, 56, 15]. For real-world data, we use the BMVC [36] dataset for quantitative evaluation.
Dataset Splits Yes We generated 188,726 images for training and validation. Due to computational and time constraints, we used 20,000 images for training and kept the rest for validation usage. Also, we generated 4,000 images for the test split.
Hardware Specification Yes The training was performed using two A100 GPUs, each 40GB of GPU memory, and with a batch size of 26 for each GPU.
Software Dependencies No The paper mentions several software components and architectures like Adam W, 3DETR, Conv ONet, Res Next50, and Deep Lab V3Plus, but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use the Adam W [31] optimizer with a base learning rate of 9e-4 and employ a cosine scheduler with nine epochs for warm-up. The training was performed using two A100 GPUs, each 40GB of GPU memory, and with a batch size of 26 for each GPU. The number of decoder layers for D and R are set to ND = 6 and NR = 2, respectively. We set the weights for the matching cost Cmatch at λ1 = 8, λ2 = 10, λ3 = 1, λ4 = 5.