Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image
Authors: Yuki Kawana, Tatsuya Harada
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation on both synthetic and real data demonstrates that our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation. |
| Researcher Affiliation | Academia | Yuki Kawana1, Tatsuya Harada1,2 1The University of Tokyo, 2RIKEN AIP {kawana, harada}@mi.t.u-tokyo.ac.jp |
| Pseudocode | Yes | Algorithm 1 Kinematic-aware part fusion (KPF) ... Algorithm 2 Part Fusion with k Io U (PF-k Io U) |
| Open Source Code | No | The paper mentions using publicly available code for baselines (3DETR, A-SDF, OPD) but does not explicitly state that the code for their proposed method is open-source or provide a link to it. |
| Open Datasets | Yes | We evaluate our method on both synthetic and real-world data. We use the SAPIEN [58] dataset for synthetic data evaluation, following recent works on articulated shape reconstruction [24, 56, 15]. For real-world data, we use the BMVC [36] dataset for quantitative evaluation. |
| Dataset Splits | Yes | We generated 188,726 images for training and validation. Due to computational and time constraints, we used 20,000 images for training and kept the rest for validation usage. Also, we generated 4,000 images for the test split. |
| Hardware Specification | Yes | The training was performed using two A100 GPUs, each 40GB of GPU memory, and with a batch size of 26 for each GPU. |
| Software Dependencies | No | The paper mentions several software components and architectures like Adam W, 3DETR, Conv ONet, Res Next50, and Deep Lab V3Plus, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use the Adam W [31] optimizer with a base learning rate of 9e-4 and employ a cosine scheduler with nine epochs for warm-up. The training was performed using two A100 GPUs, each 40GB of GPU memory, and with a batch size of 26 for each GPU. The number of decoder layers for D and R are set to ND = 6 and NR = 2, respectively. We set the weights for the matching cost Cmatch at λ1 = 8, λ2 = 10, λ3 = 1, λ4 = 5. |