Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
COHESIV: Contrastive Object and Hand Embedding Segmentation In Video
Authors: Dandan Shan, Richard Higgins, David Fouhey
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train and validate on video data of humans engaged in complex behaviors using subsets of the 100 Days of Hands (100DOH) [40], EPIC-KITCHENS-55 (EPICK) [9, 10], and HO3D [18] datasets. We compare with alternate methods that range from fully-supervised bounding boxes [40] to basic motion cues from optical low [43], to saliency [51]. We show that our weakly-supervised method is comparable to the supervised bounding box detector method, while outperforming low and saliency methods. |
| Researcher Affiliation | Academia | Dandan Shan University of Michigan EMAIL Richard E.L. Higgins University of Michigan EMAIL David F. Fouhey University of Michigan EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about code release or a link to a source-code repository for the methodology described. |
| Open Datasets | Yes | We train and validate on video data of humans engaged in complex behaviors using subsets of the 100 Days of Hands (100DOH) [40], EPIC-KITCHENS-55 (EPICK) [9, 10], and HO3D [18] datasets. |
| Dataset Splits | Yes | In the 100DOH [40] dataset, we generate 80.4K 10-frame training clips from the make drinks and make food genres... In EPICK [9], we generate 30K 10-frame training clips from action segments... We generated 8K clips from HO3D, using 5K of these clips for training. This gives 1,123 test and 482 validation images for 100DOH [40] and 1,169 test and 437 validation images for [9]. |
| Hardware Specification | Yes | We trained the COHESIV model and all ablations on 5 GTX 1080 GPUs. |
| Software Dependencies | No | The paper mentions software components like RAFT [43], Frank Mocap [37], U-Net-style [38] network, SE-Net [22], and Adam W [29] optimizer, but does not specify their version numbers. |
| Experiment Setup | Yes | We use the Adam W [29] optimizer, with initial learning rate of 10 2 and batch size of 10. We used a learning rate scheduler that cut the learning rate in half after 5 epochs without a validation loss decrease. We used early stopping and halted training if the validation loss did not decrease after 10 epochs. |