Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

Authors: Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando D De la Torre

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several benchmarks and in-the-wild tests demonstrate that Hamba significantly outperforms existing SOTAs, achieving the PA-MPVPE of 5.3mm and F@15mm of 0.992 on Frei HAND.
Researcher Affiliation Academia Haoye Dong , Aviral Chharia Wenbo Gou Francisco Vicente Carrasco Fernando De la Torre Carnegie Mellon University {haoyed, achharia, wgou, fvicente, ftorre}@andrew.cmu.edu
Pseudocode Yes Algorithm 1 Graph-guided State Space (GSS) block
Open Source Code Yes Our code was included in the Supplementary .zip file during the Neur IPS review. We will open-source it shortly with a detailed readme on the project s Github repository.
Open Datasets Yes We train Hamba on 2.7M training samples from multiple datasets (same setting as [70] for a fair comparison) that had either both 2D and 3D hand annotations or just 2D annotations. This included Frei HAND [111], HO3D [29], MTC [91], RHD [110], Inter Hand2.6M [64], H2O3D [29], Dex YCB [6], COCO-Wholebody [36], Halpe [21], and MPII NZSL [79] datasets.
Dataset Splits No The paper mentions 'Early stopping was used after 170k steps to prevent overfitting', implying the use of a validation set, but it does not specify the explicit split (e.g., percentages or counts) or how the validation data was partitioned from the training samples.
Hardware Specification Yes The Joints Regressor (JR) was trained on a single NVIDIA A4500 GPU... The complete Hamba model was trained on two NVIDIA A6000 GPUs... Hamba (Ours) 1 A100, 300K Steps
Software Dependencies No The paper mentions 'pytorch.nn.Functional.grid_sample module of Py Torch' but does not specify the version number for PyTorch or other software dependencies.
Experiment Setup Yes We set learning rate as 10 5, weight decay factor as 10 4, with the sum loss. Weights for each term in the loss function are λ3D = 0.05 for 3D keypoint loss, λ2D = 0.01 for 2D keypoint loss, λθ = 0.001 for global orientation and hand pose loss. Weights for beta and adversarial loss, i.e., λβ and λadv were set as 0.0005.