reproducibilityindex.ai

Vision-Language Fusion for Object Recognition

Authors: Sz-Rung Shiang, Stephanie Rosenthal, Anatole Gershman, Jaime Carbonell, Jean Oh

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we achieve up to 9.4% and 16.6% accuracy improvements using the oracle and the detected bounding boxes, respectively, over the vision-only recognizers.
Researcher Affiliation	Academia	Sz-Rung Shiang, Stephanie Rosenthal, Anatole Gershman, Jaime Carbonell, Jean Oh School of Computer Science, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, Pennsylvania, 15213 sshiang@andrew.cmu.edu, {srosenth, anatoleg, jgc, jeanoh}@cs.cmu.edu
Pseudocode	No	The paper describes the Multi Rank algorithm descriptively and with equations, but does not include a structured pseudocode block or a figure explicitly labeled 'Algorithm'.
Open Source Code	No	The paper does not provide any specific repository link or explicit statement about the release of its source code.
Open Datasets	Yes	We validate our algorithm on the NYU Depth V2 datasets (Silberman et al. 2012).
Dataset Splits	Yes	Using 5-fold cross validation, this vision-only model achieves an accuracy of 0.6299 and m AP 0.7240 in the ground-truth bounding box case and accuracy 0.4229 and m AP 0.2820 in the detected bounding box case. and 10 additional images were used for validation to tune the parameter α in Equation (3).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software like Caffe, Alexnet, and SVM classiﬁer, along with their respective citations, but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Multi Rank includes two parameters: α and β. Parameter α represents the informativeness of contextual information in the re-ranking process... The parameter β similarly takes the conﬁdence score of each boxgraph into account... These parameters were tuned empirically. Figure 5 shows that the accuracy is maximized when the CV output and the contextual information are fused at around 6 : 4 ratio when 10 relations are used.