Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Extracting Visual Knowledge from the Web with Multimodal Learning
Authors: Dihong Gong, Daisy Zhe Wang
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results based on 46 object categories show that the extraction precision is improved significantly from 73% (with state-of-the-art deep learning programs) to 81%, which is equivalent to a 31% reduction in error rates. |
| Researcher Affiliation | Academia | Dihong Gong, Daisy Zhe Wang Department of Computer and Information Science and Engineering University of Florida {gongd, daisyw}@uļ¬.edu |
| Pseudocode | No | The paper describes algorithms but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | In this paper, we have applied the hierarchical softmax whose implementation is based on Google word2vec1. 1https://code.google.com/archive/p/word2vec |
| Open Datasets | Yes | We evaluate our approach based on a collection of web pages and images derived from the Common Crawl dataset [Smith et al., 2013] that is publicly available on Amazon S3. |
| Dataset Splits | No | The paper mentions training data for visual object detectors and evaluation sample sizes, but does not specify explicit train/validation/test splits for the main dataset or experiments. |
| Hardware Specification | Yes | The Caffe Net models with feature dimension of 4096 were trained on a NVIDIA Tesla K40c GPU. |
| Software Dependencies | No | Parse the HTML webpages, with a C++ open-source program Gumbo-Parser by Google2. |
| Experiment Setup | Yes | For multimodal embedding, we set the dimension of vector representations as 500 (we found that dimensions between 100 and 1000 give similar results) according to the recommendation from [Frome et al., 2013]. For structure learning, we tune the Ī» parameter in Equation (7) on training data such that the number of non-zero elements is around 100 for the Īø parameter. |