reproducibilityindex.ai

A Proposal-Based Approach for Activity Image-to-Video Retrieval

Authors: Ruicong Xu, Li Niu, Jianfu Zhang, Liqing Zhang12524-12531

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three widely-used datasets verify the effectiveness of our approach.
Researcher Affiliation	Academia	Mo E Key Lab of Artiﬁcial Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China. {ranranxu, utscnewly, c.sis}@sjtu.edu.cn, zhang-lq@cs.sjtu.edu.cn
Pseudocode	No	The paper describes the proposed algorithm using textual descriptions and mathematical formulas, but it does not include a formally structured pseudocode block or algorithm box.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository.
Open Datasets	Yes	Therefore, we construct video-image datasets for the AIVR task based on public video datasets, i.e., THUMOS 142, Activity Net (Heilbron et al. 2015) and MED2017 Event3 datatsets, in which THUMOS 14 and Activity Net datasets are action-based datasets while MED2017 Event dataset is an event-based dataset. 2http://crcv.ucf.edu/THUMOS14/ 3https://www.nist.gov/itl/iad/mig/med-2017-evaluation/
Dataset Splits	No	The paper specifies training and testing pairs for each dataset (e.g., 'for THUMOS 14 dataset, we form 1500 training pairs and 406 testing pairs'), but does not explicitly mention a dedicated validation dataset split for hyperparameter tuning or model selection.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud computing instances) used to run the experiments.
Software Dependencies	No	The paper mentions the use of models like VGG and R-C3D and discusses feature extraction, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	In the projection module, mapping functions fv( ) (resp., fu( )) are implemented as three fully-connected layers as follows. fv : V (d1 = 4096) 500 200 V (r = 64) and fu : u(d2 = 128) 100 80 u(r = 64). [...] where α and β are trade-off parameters and empirically ﬁxed as 0.1 and 10 respectively in our experiments. [...] We extract a 4096-dim feature vector for each activity proposal and each video is represented by a bag of top-60 proposal features, i.e., k = 60, by ranking the scores that may contain activities. In our geometry-aware triplet loss, we use top-50 proposals in each bag, i.e., b = 50.