MFOS: Model-Free & One-Shot Object Pose Estimation
Authors: JongMin Lee, Yohann Cabon, Romain Brégier, Sungjoo Yoo, Jerome Revaud
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments and report state-of-the-art one-shot performance on the challenging LINEMOD benchmark. Finally, extensive ablations allow us to determine good practices with this relatively new type of architecture in the field. |
| Researcher Affiliation | Collaboration | Jong Min Lee1, Yohann Cabon2, Romain Brégier2, Sungjoo Yoo1, Jerome Revaud2 1Seoul National University 2Naver Labs Europe |
| Pseudocode | No | The paper describes its architecture and procedures textually and with diagrams but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | To ensure the generalization capability of our model, we train it on a diverse set of datasets covering a large panel of diversity. Specifically, we choose the large-scale ABO dataset (Collins et al. 2022)... We also use for training some datasets of the BOP challenge (Hodan et al. 2018)... Additionally, we incorporate the One Pose dataset (Sun et al. 2022)... |
| Dataset Splits | Yes | For the evaluation, we use the standard train-test split proposed in (Li, Wang, and Ji 2019) and follow the protocol defined in One Pose++ (He et al. 2023)... For the Onepose (Sun et al. 2022) and ABO (Collins et al. 2022) datasets, we use the official test splits as well. |
| Hardware Specification | Yes | We report median computation times obtained with a NVIDIA V100 GPU, repeating measures 10 times for robustness. |
| Software Dependencies | No | The paper mentions several tools and models used (e.g., ViT-Base/16, AdamW, CroCo v2, SQ-PnP, YOLOv5) but does not provide specific version numbers for underlying software libraries or programming environments (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Network architecture and training hyper-parameters. We use a Vi T-Base/16 (Dosovitskiy et al. 2021) for the image encoder... We use relative positional encoding (Ro PE (Su et al. 2021))... We train our network with Adam W with β = (0.9, 0.95) and a cosine-decaying learning rate going from 10 4 to 10 6. We initialize the network weights using Cro Co v2... During training, we feed the network with batches of 16 48 = 768 images... |