reproducibilityindex.ai

Extracting Visual Knowledge from the Web with Multimodal Learning

Authors: Dihong Gong, Daisy Zhe Wang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results based on 46 object categories show that the extraction precision is improved significantly from 73% (with state-of-the-art deep learning programs) to 81%, which is equivalent to a 31% reduction in error rates.
Researcher Affiliation	Academia	Dihong Gong, Daisy Zhe Wang Department of Computer and Information Science and Engineering University of Florida {gongd, daisyw}@uﬂ.edu
Pseudocode	No	The paper describes algorithms but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	In this paper, we have applied the hierarchical softmax whose implementation is based on Google word2vec1. 1https://code.google.com/archive/p/word2vec
Open Datasets	Yes	We evaluate our approach based on a collection of web pages and images derived from the Common Crawl dataset [Smith et al., 2013] that is publicly available on Amazon S3.
Dataset Splits	No	The paper mentions training data for visual object detectors and evaluation sample sizes, but does not specify explicit train/validation/test splits for the main dataset or experiments.
Hardware Specification	Yes	The Caffe Net models with feature dimension of 4096 were trained on a NVIDIA Tesla K40c GPU.
Software Dependencies	No	Parse the HTML webpages, with a C++ open-source program Gumbo-Parser by Google2.
Experiment Setup	Yes	For multimodal embedding, we set the dimension of vector representations as 500 (we found that dimensions between 100 and 1000 give similar results) according to the recommendation from [Frome et al., 2013]. For structure learning, we tune the λ parameter in Equation (7) on training data such that the number of non-zero elements is around 100 for the θ parameter.