Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Vision-Language Fusion for Object Recognition
Authors: Sz-Rung Shiang, Stephanie Rosenthal, Anatole Gershman, Jaime Carbonell, Jean Oh
AAAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we achieve up to 9.4% and 16.6% accuracy improvements using the oracle and the detected bounding boxes, respectively, over the vision-only recognizers. |
| Researcher Affiliation | Academia | Sz-Rung Shiang, Stephanie Rosenthal, Anatole Gershman, Jaime Carbonell, Jean Oh School of Computer Science, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, Pennsylvania, 15213 EMAIL, EMAIL |
| Pseudocode | No | The paper describes the Multi Rank algorithm descriptively and with equations, but does not include a structured pseudocode block or a figure explicitly labeled 'Algorithm'. |
| Open Source Code | No | The paper does not provide any specific repository link or explicit statement about the release of its source code. |
| Open Datasets | Yes | We validate our algorithm on the NYU Depth V2 datasets (Silberman et al. 2012). |
| Dataset Splits | Yes | Using 5-fold cross validation, this vision-only model achieves an accuracy of 0.6299 and m AP 0.7240 in the ground-truth bounding box case and accuracy 0.4229 and m AP 0.2820 in the detected bounding box case. and 10 additional images were used for validation to tune the parameter α in Equation (3). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Caffe, Alexnet, and SVM classifier, along with their respective citations, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Multi Rank includes two parameters: α and β. Parameter α represents the informativeness of contextual information in the re-ranking process... The parameter β similarly takes the confidence score of each boxgraph into account... These parameters were tuned empirically. Figure 5 shows that the accuracy is maximized when the CV output and the contextual information are fused at around 6 : 4 ratio when 10 relations are used. |