VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception
Authors: Zhaoliang Wan, Yonggen Ling, Senlin Yi, Lu Qi, Wang Wei Lee, Minglei Lu, Sicheng Yang, Xiao Teng, Peng Lu, Xu Yang, Ming-Hsuan Yang, Hui Cheng
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper addresses the scarcity of large-scale datasets for accurate object-in-hand pose estimation...Vin T-6D comprises 2 million Vin T-Sim and 0.1 million Vin T-Real splits, collected via simulations in Mu Jo Co and Blender and a custom-designed real-world platform. Built upon Vin T-6D, we present a benchmark method that shows significant improvements in performance by fusing multi-modal information. Extensive experiments show the effectiveness of our method compared with the other works. (Abstract and Introduction) Section 5. Experimental Results. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China 2Robotics X, Tencnet, Shenzhen, China 3The University of California, Merced, Merced, the U.S. 4Chinese Academic of Sciences, Automation Institute, Beijing, China. |
| Pseudocode | No | The paper describes the architecture of Vin T-Net (Section 4) and its sub-modules, including an overview diagram (Figure 8). However, it does not provide any pseudocode blocks or algorithm listings labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | The project is available at https://Vin T-6D.github.io/. |
| Open Datasets | Yes | Vin T-6D comprises 2 million Vin T-Sim and 0.1 million Vin T-Real splits, collected via simulations in Mu Jo Co and Blender and a custom-designed real-world platform. (Abstract) The project is available at https://Vin T-6D.github.io/. |
| Dataset Splits | No | The paper mentions training parameters like learning rate, batch size, and epochs (Section 5.1 Implementation Details). However, it does not explicitly specify the percentages or counts for training, validation, and test splits for its dataset, nor does it reference predefined splits from established benchmarks for reproducibility. |
| Hardware Specification | Yes | The training and testing processes were executed on a computing server equipped with 6 Quadro RTX 8000 GPUs. The Vin T-Sim synthesis procedures were conducted by a cloud computing platform that utilized 16 NVIDIA P40 GPUs. |
| Software Dependencies | No | The paper mentions software tools like Mu Jo Co and Blender for simulation, and SAM (Segment Anything Model) for segmentation. However, it does not provide specific version numbers for these or any other software components, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | We used the Adam optimizer with an initial learning rate of 0.01 for training and set the batch size at 24. The training process was conducted over 25 epochs, and we set the hyper-parameters λ1, λ2, and λ3 to 1, 2, and 1, respectively. |