SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation
Authors: Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model through real-world experiments with a robot hand, opting for direct assessment in real-world settings to leverage the superior stability of large vision models like DINO (Caron et al., 2021; Oquab et al., 2023) on real images over synthetic ones. |
| Researcher Affiliation | Academia | 1 CFCS, School of Computer Science, Peking University, China 2 Department of Computer Science, Stanford University, USA 3 Institute for AI, Peking University, China 4 PKU-WUHAN Institute for Artificial Intelligence, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://helloqxwang.github.io/SparseDFF |
| Open Datasets | Yes | Box: A demonstration with the Cheez-It box from the YCB dataset (Calli et al., 2015b; 2017; 2015a) (ID=3) is utilized for initial evaluation. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly state specific training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | After this setup, our feature network takes 20000 iterations for adaptation, roughly 300 seconds using a single NVIDIA Ge Force RTX 3090. Once trained, the network is applied unchanged to different real-world scenes to optimize the hand pose for 300 iterations, roughly 20 seconds using a single NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions using large vision models like DINO, but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | In our implementation, we set λpen 10 1, λspen 10 2, λpose 10 2. |