MFOS: Model-Free & One-Shot Object Pose Estimation

Authors: JongMin Lee, Yohann Cabon, Romain Brégier, Sungjoo Yoo, Jerome Revaud

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments and report state-of-the-art one-shot performance on the challenging LINEMOD benchmark. Finally, extensive ablations allow us to determine good practices with this relatively new type of architecture in the field.
Researcher Affiliation Collaboration Jong Min Lee1, Yohann Cabon2, Romain Brégier2, Sungjoo Yoo1, Jerome Revaud2 1Seoul National University 2Naver Labs Europe
Pseudocode No The paper describes its architecture and procedures textually and with diagrams but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes To ensure the generalization capability of our model, we train it on a diverse set of datasets covering a large panel of diversity. Specifically, we choose the large-scale ABO dataset (Collins et al. 2022)... We also use for training some datasets of the BOP challenge (Hodan et al. 2018)... Additionally, we incorporate the One Pose dataset (Sun et al. 2022)...
Dataset Splits Yes For the evaluation, we use the standard train-test split proposed in (Li, Wang, and Ji 2019) and follow the protocol defined in One Pose++ (He et al. 2023)... For the Onepose (Sun et al. 2022) and ABO (Collins et al. 2022) datasets, we use the official test splits as well.
Hardware Specification Yes We report median computation times obtained with a NVIDIA V100 GPU, repeating measures 10 times for robustness.
Software Dependencies No The paper mentions several tools and models used (e.g., ViT-Base/16, AdamW, CroCo v2, SQ-PnP, YOLOv5) but does not provide specific version numbers for underlying software libraries or programming environments (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Network architecture and training hyper-parameters. We use a Vi T-Base/16 (Dosovitskiy et al. 2021) for the image encoder... We use relative positional encoding (Ro PE (Su et al. 2021))... We train our network with Adam W with β = (0.9, 0.95) and a cosine-decaying learning rate going from 10 4 to 10 6. We initialize the network weights using Cro Co v2... During training, we feed the network with batches of 16 48 = 768 images...