reproducibilityindex.ai

Self-View Grounding Given a Narrated 360° Video

Authors: Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our method, we collect the ﬁrst narrated 360 videos dataset and achieve state-of-the-art NFo V-grounding performance. ... Experiments Because the style of indoor videos and outdoor videos are different on both vision and subtitles, we ﬁrst conduct the ablation studies of our proposed method and compare our model with baselines in the beginning. ... Finally, we show the results and make a brief discussion.
Researcher Affiliation	Collaboration	Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun Department of Electrical Engineering, National Tsing Hua University Microsoft Research, Beijing, China {happy810705, yichun8447}@gmail.com, khzeng@cs.stanford.edu {eborboihuc@gapp, sunmin@ee}.nthu.edu.tw, jianf@microsoft.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'We implement all of our methods by Py Torch (Paszke and Chintala )', but it does not provide a specific link or explicit statement about releasing their source code for the described methodology.
Open Datasets	Yes	To evaluate our method, we collect the ﬁrst narrated 360 videos dataset. This dataset consists of touring videos, including scenic spots and housing introduction, and subtitles ﬁles, including subtitle text and start and end timecode. ... (Available at http://aliensunmin.github.io/project/360grounding/)... We use Res Net-101 pre-training on Image Net (Deng et al. 2009) as our visual encoder and we pre-train our language decoder on MSCOCO dataset (Lin et al. 2014b).
Dataset Splits	Yes	We assign 80% of the videos and subtitles for training and 10% each for validation and testing.
Hardware Specification	Yes	We conduct all experiments on a single computer with Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 64GB RAM DDR3, and an NVIDIA Titan X GPU.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number. It also references models like ResNet-101, GRU, and LSTM, but these are architectural components, not software dependencies with version numbers for reproducibility.
Experiment Setup	Yes	We set λ = 0.8. ... We decrease the frame rate to 1 to save memory usage and set dictionary dimension as 9956 according to the number of words appearing in all subtitles. We randomly sample 3 consecutive frames during training phase (i.e., k = 3)... Since the maximal length of subtitles is 33, we set m = 33... We use Adam (Kingma and Ba 2015) as opti-mizer with default hyperparameters and 0.001 learning rate and set batch size B by 4.