Adaptive Feature Abstraction for Translating Video to Language
Authors: Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate quantitatively the effectiveness of our proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantic structures. |
| Researcher Affiliation | Collaboration | Yunchen Pu Department of Electrical and Computer Engineering Duke University yunchen.pu@duke.edu Martin Renqiang Min Machine Learning Group NEC Laboratories America renqiang@nec-labs.com Zhe Gan Department of Electrical and Computer Engineering Duke University zhe.gan@duke.edu Lawrence Carin Department of Electrical and Computer Engineering Duke University lcarin@duke.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We present results on Microsoft Research Video Description Corpus (You Tube2Text) (Chen & Dolan, 2011). |
| Dataset Splits | Yes | For fair comparison, we used the same splits as provided in Yu et al. (2016), with 1200 videos for training, 100 videos for validation, and 670 videos for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes data preprocessing steps (e.g., 'all videos are resized to 112 × 112 spatially, with 2 frames per second') and feature extraction methods, but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or explicit training schedules in the main text. |