reproducibilityindex.ai

Adaptive Feature Abstraction for Translating Video to Text

Authors: Yunchen Pu, Martin Min, Zhe Gan, Lawrence Carin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed approach is evaluated on three benchmark datasets: You Tube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.
Researcher Affiliation	Collaboration	Department of Electrical and Computer Engineering, Duke University {yunchen.pu, zhe.gan, lcarin}@duke.edu Machine Learning Group, NEC Laboratories America renqiang@nec-labs.com
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or links to a code repository.
Open Datasets	Yes	We present results on three benchmark datasets: Microsoft Research Video Description Corpus (You Tube2Text) (Chen and Dolan 2011), Montreal Video Annotation Dataset (M-VAD) (Torabi, C Pal, and Courville 2015), and Microsoft Research Video to Text (MSR-VTT) (Xu et al. 2016).
Dataset Splits	Yes	For fair comparison, we used the same splits as provided in Venugopalan et al. (2015b), with 1200 videos for training, 100 videos for validation, and 670 videos for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software components and models like C3D, LSTM, RNN, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We do not perform any dataset-speciﬁc tuning and regularization other than dropout (Srivastava et al. 2014) and early stopping on validation sets.