reproducibilityindex.ai

Sequence-to-Sequence Learning via Shared Latent Representation

Authors: Xu Shen, Xinmei Tian, Jun Xing, Yong Rui, Dacheng Tao

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our SLR model is validated on the Youtube2Text and MSR-VTT datasets, achieving superior performance on video-to-sentence task, and the ﬁrst sentence-to-video results.
Researcher Affiliation	Collaboration	CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application Systems, University of Science and Technology of China, China Institute for Creative Technologies, University of Southern California Lenovo Research UBTECH Sydney Artiﬁcial Intelligence Institute, SIT, FEIT, University of Sydney, Australia
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Models are tested on the Microsoft Research Video Description Corpus (You Tube2Text) (Guadarrama et al. 2013) and the MSRVTT dataset (Xu et al. 2016).
Dataset Splits	Yes	The You Tube2Text dataset contains 1, 970 videos and about 40 English sentences for each video. Following previous works, we randomly split 1, 200 videos for training, 100 for validation and 670 videos for testing as in (Yao et al. 2015). The MSRVTT dataset contains 6, 513 videos for training, 497 videos for validation and 2, 990 videos for testing and each video corresponds to 20 descriptions.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components like LSTMs, VGG, C3D, and Adam optimizer but does not provide specific version numbers for any of these or other software dependencies.
Experiment Setup	Yes	We use an initial learning rate 0.0001 on the You Tube2Text and 0.001 learning rate on the MSRVTT dataset for the full-model learning stage, and decay the learning rate by 10 in the partial-model learning stage. The full model learning stage is trained for 20 epochs on the You Tube2Text and 60 epochs on the MSRVTT dataset. The partial model learning stage is trained for 80/40 epochs on the You Tube2Text and the MSRVTT dataset, respectively. Finally, we ﬁne-tune the learned model on the speciﬁc task (i.e. video-to-sentence) for 20 epochs. We train the model by Adam optimizer with 100 mini batch size. Gradients of parameters are clipped to maximum 35 L2 norm.