A Proposal-Based Approach for Activity Image-to-Video Retrieval
Authors: Ruicong Xu, Li Niu, Jianfu Zhang, Liqing Zhang12524-12531
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three widely-used datasets verify the effectiveness of our approach. |
| Researcher Affiliation | Academia | Mo E Key Lab of Artificial Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China. {ranranxu, utscnewly, c.sis}@sjtu.edu.cn, zhang-lq@cs.sjtu.edu.cn |
| Pseudocode | No | The paper describes the proposed algorithm using textual descriptions and mathematical formulas, but it does not include a formally structured pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Therefore, we construct video-image datasets for the AIVR task based on public video datasets, i.e., THUMOS 142, Activity Net (Heilbron et al. 2015) and MED2017 Event3 datatsets, in which THUMOS 14 and Activity Net datasets are action-based datasets while MED2017 Event dataset is an event-based dataset. 2http://crcv.ucf.edu/THUMOS14/ 3https://www.nist.gov/itl/iad/mig/med-2017-evaluation/ |
| Dataset Splits | No | The paper specifies training and testing pairs for each dataset (e.g., 'for THUMOS 14 dataset, we form 1500 training pairs and 406 testing pairs'), but does not explicitly mention a dedicated validation dataset split for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud computing instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions the use of models like VGG and R-C3D and discusses feature extraction, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | In the projection module, mapping functions fv( ) (resp., fu( )) are implemented as three fully-connected layers as follows. fv : V (d1 = 4096) 500 200 V (r = 64) and fu : u(d2 = 128) 100 80 u(r = 64). [...] where α and β are trade-off parameters and empirically fixed as 0.1 and 10 respectively in our experiments. [...] We extract a 4096-dim feature vector for each activity proposal and each video is represented by a bag of top-60 proposal features, i.e., k = 60, by ranking the scores that may contain activities. In our geometry-aware triplet loss, we use top-50 proposals in each bag, i.e., b = 50. |