Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Authors: Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando D De la Torre
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several benchmarks and in-the-wild tests demonstrate that Hamba significantly outperforms existing SOTAs, achieving the PA-MPVPE of 5.3mm and F@15mm of 0.992 on Frei HAND. |
| Researcher Affiliation | Academia | Haoye Dong , Aviral Chharia Wenbo Gou Francisco Vicente Carrasco Fernando De la Torre Carnegie Mellon University {haoyed, achharia, wgou, fvicente, ftorre}@andrew.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Graph-guided State Space (GSS) block |
| Open Source Code | Yes | Our code was included in the Supplementary .zip file during the Neur IPS review. We will open-source it shortly with a detailed readme on the project s Github repository. |
| Open Datasets | Yes | We train Hamba on 2.7M training samples from multiple datasets (same setting as [70] for a fair comparison) that had either both 2D and 3D hand annotations or just 2D annotations. This included Frei HAND [111], HO3D [29], MTC [91], RHD [110], Inter Hand2.6M [64], H2O3D [29], Dex YCB [6], COCO-Wholebody [36], Halpe [21], and MPII NZSL [79] datasets. |
| Dataset Splits | No | The paper mentions 'Early stopping was used after 170k steps to prevent overfitting', implying the use of a validation set, but it does not specify the explicit split (e.g., percentages or counts) or how the validation data was partitioned from the training samples. |
| Hardware Specification | Yes | The Joints Regressor (JR) was trained on a single NVIDIA A4500 GPU... The complete Hamba model was trained on two NVIDIA A6000 GPUs... Hamba (Ours) 1 A100, 300K Steps |
| Software Dependencies | No | The paper mentions 'pytorch.nn.Functional.grid_sample module of Py Torch' but does not specify the version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We set learning rate as 10 5, weight decay factor as 10 4, with the sum loss. Weights for each term in the loss function are λ3D = 0.05 for 3D keypoint loss, λ2D = 0.01 for 2D keypoint loss, λθ = 0.001 for global orientation and hand pose loss. Weights for beta and adversarial loss, i.e., λβ and λadv were set as 0.0005. |