Unsupervised Alignment of Actions in Video with Text Descriptions

Authors: Young Chol Song, Iftekhar Naim, Abdullah Al Mamun, Kaustubh Kulkarni, Parag Singla, Jiebo Luo, Daniel Gildea, Henry Kautz

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section describes the evaluation of hyperfeature construction and alignment of actions on two multimodal datasets with parallel video and text.
Researcher Affiliation Academia 1Department of Computer Science, University of Rochester, Rochester, NY, USA 2Indian Institute of Technology Delhi, New Delhi, India
Pseudocode Yes Algorithm 1 describes this process in detail.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes The Wetlab dataset [Naim et al., 2014; 2015], The TACo S corpus [Regneri et al., 2013], We evaluate our system on action features generated by CNN models trained using the UCF101 action recognition dataset [Soomro et al., 2012].
Dataset Splits No The paper evaluates on datasets like Wetlab and TACo S, stating that ground truth segmentation is used for evaluation in the latter. However, it does not specify explicit training, validation, and testing splits (e.g., percentages or counts) for model reproduction.
Hardware Specification Yes Each iteration per video took an average of 6.6 seconds on a single core of a 2.4GHz Intel Xeon processor with 32GB of RAM.
Software Dependencies No The paper mentions using a 'two-stage Charniak-Johnson parser', a 'Kalman filter', and a 'modified version of the SLIC superpixel algorithm', but does not provide specific version numbers for these or any other software components used in the experiments.
Experiment Setup Yes For hyperfeature variables {d(1), w, d(2)}, we achieved best results using {64, 150, 32} for STIP, {128, 150, 32} for dense trajectory, and {128, 150, 64} for CNN features. For all the variations, we train LCRF models by running 200 iterations over the entire dataset.