reproducibilityindex.ai

DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation

Authors: Rong Wang, Wei Mao, Hongdong Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization.
Researcher Affiliation	Academia	Rong Wang Wei Mao Hongdong Li The Australian National University {rong.wang, wei.mao, hongdong.li}@anu.edu.au
Pseudocode	No	No explicit pseudocode or algorithm blocks found. Figure 2 provides a schematic overview, not a step-by-step algorithm.
Open Source Code	Yes	The code is available at https://github.com/rongakowang/Deep Sim HO.
Open Datasets	Yes	We evaluate our method and state-of-the-art methods on two datasets: Dex YCB [6] and HO3D [18].
Dataset Splits	Yes	We use the official "S0" train-test split for the training and evaluation. Following [52], we evaluate on right hand poses and filter out samples in which the hand or object is not within the field of view of the camera. To ensure consistent comparison in physics metrics, we remove test samples where the hand does not interact with the object and only select those that remain stable after simulation (see the GT results in Table 1), resulting in a total of 6348 samples. For training, we do not perform this selection of stability in order to include more data. However, we mask out the replicated stability loss on unstable training samples, e.g. no hand-object interaction, to avoid misleading supervision. The HO3D dataset [18] consists of 66K frames featuring 10 different objects. We select the "v2" version that is mostly evaluated by previous works [56, 20, 52, 34]. Since its ground truth hand poses for the test set are not released, we follow [56] to evaluate on a subset named "v2 ", whose physics plausibility is manually verified by [56]. For training data, we use the official HO3D v2 training split and follow the same practice as the Dex YCB dataset to perform sample selection and loss masking. The total HO3Dv2 test set consists of 6076 samples.
Hardware Specification	Yes	We implement the model in PyTorch [42] and train it using the Adam [29] optimizer on a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions PyTorch and MuJoCo but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We set the learning rate to 5e-5. When training on both datasets, we follow [34, 52] to crop input images into 224x224 pixels using the provided hand-object bounding boxes. In addition, we follow [34] and perform data augmentation with random translation and rescaling by a factor of 0.1. However, we exclude rotation augmentation as it can affect the ground truth stability. Finally, we set λh = 0.5, λd = 0.1, λs = 0.1, and follow [34] to set λo1 = 0, λo2 = 0.2 on the Dex YCB dataset and λo1 = 0.2, λo2 = 0 on the HO3D dataset for a fair comparison. For physics simulation, we use the MuJoCo [48] simulator. ... We set the gravity acceleration as 9.8 m/s2 in the y direction of the camera frame. For the adhesion force, we empirically set the gain as 100 and the maximum control range as 10... Finally, we set the simulation step to be T = 100 and time duration in each step as t = 0.02.