Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos
Authors: Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. |
| Researcher Affiliation | Collaboration | Mohamed Elhoseiny , Jingen Liu , Hui Cheng , Harpreet Sawhney , Ahmed Elgammal m.elhoseiny@cs.rutgers.edu,{jingen.liu,hui.cheng,harpreet.sawhney}@sri.com, elgammal@cs.rutgers.edu Rutgers University, Computer Science Department SRI International, Vision and Learning Group |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Supplementary Materials (SM) could be found here https://sites. google.com/site/mhelhoseiny/EDi SE supp.zip |
| Open Datasets | Yes | We evaluated our method on the large TRECVID MED (Felzenszwalb, Mc Allester, and Ramanan 2013). |
| Dataset Splits | No | The paper mentions evaluating on the 'MEDTest set' of TRECVID MED, but does not explicitly provide details about specific training/validation/test splits for their experiments on this dataset (e.g., percentages, sample counts, or explicit standard split citations for all three partitions). |
| Hardware Specification | Yes | it takes 270 seconds on a 16 cores Intel Xeon processor (64GB RAM) to the retrieval task on 20 events altogether. |
| Software Dependencies | No | The paper mentions various models and tools used (e.g., Mikolov et al. 2013b for word embedding, Overfeat, SIFT, HOG), but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In practice, we only include θv(ci) in ψ(vc) such that ci is among the top R concepts with highest p(ec|ci). This is assuming that the remaining concepts are assigned p(ec|ci) = 0 which makes those items vanish; we used R=5. We fuse p(ec|v), p(eo|v), and p(ea|v) by weighted geometric mean with focus on visual concepts, i.e. p(e|v) = w+1 p(eo|v)p(ea|v)); w = 6. M= 250. l = 50% (i.e., median). M = 300. |