reproducibilityindex.ai

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Authors: Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed Elgammal

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric.
Researcher Affiliation	Collaboration	Mohamed Elhoseiny , Jingen Liu , Hui Cheng , Harpreet Sawhney , Ahmed Elgammal m.elhoseiny@cs.rutgers.edu,{jingen.liu,hui.cheng,harpreet.sawhney}@sri.com, elgammal@cs.rutgers.edu Rutgers University, Computer Science Department SRI International, Vision and Learning Group
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Supplementary Materials (SM) could be found here https://sites. google.com/site/mhelhoseiny/EDi SE supp.zip
Open Datasets	Yes	We evaluated our method on the large TRECVID MED (Felzenszwalb, Mc Allester, and Ramanan 2013).
Dataset Splits	No	The paper mentions evaluating on the 'MEDTest set' of TRECVID MED, but does not explicitly provide details about specific training/validation/test splits for their experiments on this dataset (e.g., percentages, sample counts, or explicit standard split citations for all three partitions).
Hardware Specification	Yes	it takes 270 seconds on a 16 cores Intel Xeon processor (64GB RAM) to the retrieval task on 20 events altogether.
Software Dependencies	No	The paper mentions various models and tools used (e.g., Mikolov et al. 2013b for word embedding, Overfeat, SIFT, HOG), but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In practice, we only include θv(ci) in ψ(vc) such that ci is among the top R concepts with highest p(ec\|ci). This is assuming that the remaining concepts are assigned p(ec\|ci) = 0 which makes those items vanish; we used R=5. We fuse p(ec\|v), p(eo\|v), and p(ea\|v) by weighted geometric mean with focus on visual concepts, i.e. p(e\|v) = w+1 p(eo\|v)p(ea\|v)); w = 6. M= 250. l = 50% (i.e., median). M = 300.