reproducibilityindex.ai

Unsupervised Alignment of Actions in Video with Text Descriptions

Authors: Young Chol Song, Iftekhar Naim, Abdullah Al Mamun, Kaustubh Kulkarni, Parag Singla, Jiebo Luo, Daniel Gildea, Henry Kautz

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section describes the evaluation of hyperfeature construction and alignment of actions on two multimodal datasets with parallel video and text.
Researcher Affiliation	Academia	1Department of Computer Science, University of Rochester, Rochester, NY, USA 2Indian Institute of Technology Delhi, New Delhi, India
Pseudocode	Yes	Algorithm 1 describes this process in detail.
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets	Yes	The Wetlab dataset [Naim et al., 2014; 2015], The TACo S corpus [Regneri et al., 2013], We evaluate our system on action features generated by CNN models trained using the UCF101 action recognition dataset [Soomro et al., 2012].
Dataset Splits	No	The paper evaluates on datasets like Wetlab and TACo S, stating that ground truth segmentation is used for evaluation in the latter. However, it does not specify explicit training, validation, and testing splits (e.g., percentages or counts) for model reproduction.
Hardware Specification	Yes	Each iteration per video took an average of 6.6 seconds on a single core of a 2.4GHz Intel Xeon processor with 32GB of RAM.
Software Dependencies	No	The paper mentions using a 'two-stage Charniak-Johnson parser', a 'Kalman filter', and a 'modified version of the SLIC superpixel algorithm', but does not provide specific version numbers for these or any other software components used in the experiments.
Experiment Setup	Yes	For hyperfeature variables {d(1), w, d(2)}, we achieved best results using {64, 150, 32} for STIP, {128, 150, 32} for dense trajectory, and {128, 150, 64} for CNN features. For all the variations, we train LCRF models by running 200 iterations over the entire dataset.