COHESIV: Contrastive Object and Hand Embedding Segmentation In Video

Authors: Dandan Shan, Richard Higgins, David Fouhey

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train and validate on video data of humans engaged in complex behaviors using subsets of the 100 Days of Hands (100DOH) [40], EPIC-KITCHENS-55 (EPICK) [9, 10], and HO3D [18] datasets. We compare with alternate methods that range from fully-supervised bounding boxes [40] to basic motion cues from optical low [43], to saliency [51]. We show that our weakly-supervised method is comparable to the supervised bounding box detector method, while outperforming low and saliency methods.
Researcher Affiliation Academia Dandan Shan University of Michigan dandans@umich.edu Richard E.L. Higgins University of Michigan relh@umich.edu David F. Fouhey University of Michigan fouhey@umich.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about code release or a link to a source-code repository for the methodology described.
Open Datasets Yes We train and validate on video data of humans engaged in complex behaviors using subsets of the 100 Days of Hands (100DOH) [40], EPIC-KITCHENS-55 (EPICK) [9, 10], and HO3D [18] datasets.
Dataset Splits Yes In the 100DOH [40] dataset, we generate 80.4K 10-frame training clips from the make drinks and make food genres... In EPICK [9], we generate 30K 10-frame training clips from action segments... We generated 8K clips from HO3D, using 5K of these clips for training. This gives 1,123 test and 482 validation images for 100DOH [40] and 1,169 test and 437 validation images for [9].
Hardware Specification Yes We trained the COHESIV model and all ablations on 5 GTX 1080 GPUs.
Software Dependencies No The paper mentions software components like RAFT [43], Frank Mocap [37], U-Net-style [38] network, SE-Net [22], and Adam W [29] optimizer, but does not specify their version numbers.
Experiment Setup Yes We use the Adam W [29] optimizer, with initial learning rate of 10 2 and batch size of 10. We used a learning rate scheduler that cut the learning rate in half after 5 epochs without a validation loss decrease. We used early stopping and halted training if the validation loss did not decrease after 10 epochs.