Towards A Richer 2D Understanding of Hands at Scale

Authors: Tianyi Cheng, Dandan Shan, Ayda Hassen, Richard Higgins, David Fouhey

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze our approach and dataset through a series of experiments. Across datasets, our results show that hand detection models trained on our dataset have strong zero-shot performance compared against past hand detection datasets [53, 16, 1, 44], demonstrating the expansiveness of Hands23. Within our new dataset, our experiments demonstrate that our model can detect our detailed handobject state both well and better than past efforts such as [53].
Researcher Affiliation Academia Tianyi Cheng 1 Dandan Shan 1 Ayda Sultan1,2 Richard E. L. Higgins1 David F. Fouhey1,3 1University of Michigan 2Addis Ababa University 3New York University {evacheng, dandans, ayhassen, relh, fouhey}@umich.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that the source code for their methodology is being released or provide a link to a code repository.
Open Datasets Yes Hands23 provides unified annotations for four datasets: EPIC-KITCHENS [13] VISOR [14], the 2017 train set of COCO [35], Internet Articulation [46], as well as on our introduced dataset of interaction-rich videos, New Days. All underlying image data used are public; when posted by users, they were posted publicly with a Creative Commons license selected.
Dataset Splits Yes We provide 80/10/10% train/val/test splits that split video data by channel and are backwards compatible with existing datasets. These are documented in the supplement.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions software components and models like Mask RCNN, Point Rend, DETR, and SAM but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup Yes We set t H = 0.7, TO = 0.5, TS = 0.3, TA = 0.1 and TI = 0.7 via grid search on the validation set for thresholds that achieve best evaluation results. The performance is relatively insensitive to the hand threshold, but low thresholds for hands led to lead to poor performance on interaction prediction.