Towards A Richer 2D Understanding of Hands at Scale
Authors: Tianyi Cheng, Dandan Shan, Ayda Hassen, Richard Higgins, David Fouhey
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze our approach and dataset through a series of experiments. Across datasets, our results show that hand detection models trained on our dataset have strong zero-shot performance compared against past hand detection datasets [53, 16, 1, 44], demonstrating the expansiveness of Hands23. Within our new dataset, our experiments demonstrate that our model can detect our detailed handobject state both well and better than past efforts such as [53]. |
| Researcher Affiliation | Academia | Tianyi Cheng 1 Dandan Shan 1 Ayda Sultan1,2 Richard E. L. Higgins1 David F. Fouhey1,3 1University of Michigan 2Addis Ababa University 3New York University {evacheng, dandans, ayhassen, relh, fouhey}@umich.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is being released or provide a link to a code repository. |
| Open Datasets | Yes | Hands23 provides unified annotations for four datasets: EPIC-KITCHENS [13] VISOR [14], the 2017 train set of COCO [35], Internet Articulation [46], as well as on our introduced dataset of interaction-rich videos, New Days. All underlying image data used are public; when posted by users, they were posted publicly with a Creative Commons license selected. |
| Dataset Splits | Yes | We provide 80/10/10% train/val/test splits that split video data by channel and are backwards compatible with existing datasets. These are documented in the supplement. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components and models like Mask RCNN, Point Rend, DETR, and SAM but does not provide specific version numbers for these or other ancillary software dependencies. |
| Experiment Setup | Yes | We set t H = 0.7, TO = 0.5, TS = 0.3, TA = 0.1 and TI = 0.7 via grid search on the validation set for thresholds that achieve best evaluation results. The performance is relatively insensitive to the hand threshold, but low thresholds for hands led to lead to poor performance on interaction prediction. |