Sequence-to-Segment Networks for Segment Detection

Authors: Zijun Wei, Boyu Wang, Minh Hoai Nguyen, Jianming Zhang, Zhe Lin, Xiaohui Shen, Radomir Mech, Dimitris Samaras

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on temporal action proposal and video summarization show that S2N achieves state-of-the-art performance on both tasks.
Researcher Affiliation Collaboration 1Stony Brook University, 2Adobe Research, 3Byte Dance AI Lab
Pseudocode No The paper describes the architecture and training process in text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is publicly available at https://www3.cs.stonybrook.edu/~cvl/ projects/wei2018s2n/S2N_NIPS2018s.html
Open Datasets Yes We evaluate S2Ns on the THUMOS14 dataset [21], a challenging benchmark for the action proposal task. We perform experiments on Sum Me [15], a standard benchmark for video summarization.
Dataset Splits Yes Following the standard practice, we train an S2N on the validation set and evaluate it on the testing set. ... We train an S2N using 180 out of 200 videos from the validation set and hold out 20 videos for validation. ... we use the canonical setting suggested in [47] for evaluation: we use the standard 5-fold cross validation (5FCV), i.e., 80% of videos are for training and the rest for testing.
Hardware Specification Yes Quantitatively, it takes on average 0.028s to process a 12s, 30FPS video on a GTX Titan X Maxwell GPU with 12GB memory.
Software Dependencies No The paper mentions the Adam optimizer but does not specify any software names with version numbers for reproducibility, such as programming languages, libraries, or frameworks.
Experiment Setup Yes Unless specified otherwise, the encoder is a 2 layer bi-directional GRU with 512 hidden units with dropout rate 0.5, the GRU module in SDU is one-directional with 1024 hidden units. All the models are trained with the Adam optimizer [25] for 50 epochs with an initial learning rate of 0.0001, which was decreased by a factor of 10 when the training performance plateaued, batch size of 32 and L2 gradient clipping of 1.0. The trade-off factor α in Eq. (2) is set to ensure that Lloc does not dominate in the total loss. A weight adjustment for the score predictor is also used if necessary to account for the imbalance between the positive and negative samples.