COHESIV: Contrastive Object and Hand Embedding Segmentation In Video
Authors: Dandan Shan, Richard Higgins, David Fouhey
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train and validate on video data of humans engaged in complex behaviors using subsets of the 100 Days of Hands (100DOH) [40], EPIC-KITCHENS-55 (EPICK) [9, 10], and HO3D [18] datasets. We compare with alternate methods that range from fully-supervised bounding boxes [40] to basic motion cues from optical low [43], to saliency [51]. We show that our weakly-supervised method is comparable to the supervised bounding box detector method, while outperforming low and saliency methods. |
| Researcher Affiliation | Academia | Dandan Shan University of Michigan dandans@umich.edu Richard E.L. Higgins University of Michigan relh@umich.edu David F. Fouhey University of Michigan fouhey@umich.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about code release or a link to a source-code repository for the methodology described. |
| Open Datasets | Yes | We train and validate on video data of humans engaged in complex behaviors using subsets of the 100 Days of Hands (100DOH) [40], EPIC-KITCHENS-55 (EPICK) [9, 10], and HO3D [18] datasets. |
| Dataset Splits | Yes | In the 100DOH [40] dataset, we generate 80.4K 10-frame training clips from the make drinks and make food genres... In EPICK [9], we generate 30K 10-frame training clips from action segments... We generated 8K clips from HO3D, using 5K of these clips for training. This gives 1,123 test and 482 validation images for 100DOH [40] and 1,169 test and 437 validation images for [9]. |
| Hardware Specification | Yes | We trained the COHESIV model and all ablations on 5 GTX 1080 GPUs. |
| Software Dependencies | No | The paper mentions software components like RAFT [43], Frank Mocap [37], U-Net-style [38] network, SE-Net [22], and Adam W [29] optimizer, but does not specify their version numbers. |
| Experiment Setup | Yes | We use the Adam W [29] optimizer, with initial learning rate of 10 2 and batch size of 10. We used a learning rate scheduler that cut the learning rate in half after 5 epochs without a validation loss decrease. We used early stopping and halted training if the validation loss did not decrease after 10 epochs. |