SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation

Authors: Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model through real-world experiments with a robot hand, opting for direct assessment in real-world settings to leverage the superior stability of large vision models like DINO (Caron et al., 2021; Oquab et al., 2023) on real images over synthetic ones.
Researcher Affiliation Academia 1 CFCS, School of Computer Science, Peking University, China 2 Department of Computer Science, Stanford University, USA 3 Institute for AI, Peking University, China 4 PKU-WUHAN Institute for Artificial Intelligence, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes https://helloqxwang.github.io/SparseDFF
Open Datasets Yes Box: A demonstration with the Cheez-It box from the YCB dataset (Calli et al., 2015b; 2017; 2015a) (ID=3) is utilized for initial evaluation.
Dataset Splits No The paper describes training and testing procedures but does not explicitly state specific training/validation/test dataset splits with percentages or sample counts.
Hardware Specification Yes After this setup, our feature network takes 20000 iterations for adaptation, roughly 300 seconds using a single NVIDIA Ge Force RTX 3090. Once trained, the network is applied unchanged to different real-world scenes to optimize the hand pose for 300 iterations, roughly 20 seconds using a single NVIDIA Ge Force RTX 3090.
Software Dependencies No The paper mentions using large vision models like DINO, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes In our implementation, we set λpen 10 1, λspen 10 2, λpose 10 2.