Sherlock: Scalable Fact Learning in Images
Authors: Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of > 202,000 facts and 814,000 images. Our results show the advantage of relating facts by the structure by the proposed model compared to the baselines. |
| Researcher Affiliation | Collaboration | Mohamed Elhoseiny,1,2 Scott Cohen,1 Walter Chang,1 Brian Price,1 Ahmed Elgammal2 1 Adobe Research 2 Rutgers University, Computer Science Department |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm", nor does it present structured, code-like procedural steps. |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We began our data collection by augmenting existing datasets with fact language view labels fl: PPMI (Yao and Fei-Fei 2010), Stanford40 (Yao et al. 2011), Pascal Actions (Everingham et al. ), Sports (Gupta 2009), Visual Phrases (Sadeghi and Farhadi 2011), INTERACT (Antol, Zitnick, and Parikh 2014b) datasets. |
| Dataset Splits | Yes | In this dataset, we randomly split all the annotations into an 80%-20% split, constructing sets of 647,746 (fv, fl) training pairs (with 171,269 unique fact language views fl) and 168,691 (fv, fl) testing pairs (with 58,417 unique fl), for a total of (fv, fl) 816,436 pairs, 202,946 unique fl. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like "GloVE840B model" and "theano implementation" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For the visual encoder, The shared layers θ0 c match the architecture of the convolutional layers and pooling layer in VGG-16 named conv_1_1 until pool3, and have seven convolution layers. The subject layers θS c and predicate-object layers θP O c are two branches of convolution and pooling layers with the same architecture as VGG-16 layers named conv_4_1 until pool5 layer, which makes six convolution-pooling layers in each branch. Finally, θS u and θP O u are two instances of fc6 and fc7 layers in VGG16 network. W S, W P , and W O are initialized randomly and the rest are initialized from VGG-16 trained on Image Net. |