reproducibilityindex.ai

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

Authors: Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The evaluation of the captioning system with our methodology uses an Evaluation dataset of 1000 images randomly selected from the MSCOCO validation dataset. All experiments were performed on Amazon Mechanical Turk. We report the system quality based on human assessments.
Researcher Affiliation	Collaboration	1ETH Zurich, Department of Computer Science, Switzerland 2Microsoft Research, Redmond, WA, USA
Pseudocode	No	The paper describes the proposed methodology in text and diagrams but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets	Yes	All components are individually trained on the MSCOCO dataset (Lin et al. 2014) which was built as an image captioning training set and benchmark.
Dataset Splits	Yes	We use images randomly sampled from the validation dataset to evaluate our approach. All components are individually trained on the MSCOCO dataset (Lin et al. 2014) which was built as an image captioning training set and benchmark.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies	No	The paper does not specify any software names with version numbers, such as programming languages, libraries, or frameworks used for the experiments.
Experiment Setup	No	The paper describes the methodology setup and troubleshooting steps for human-in-the-loop evaluation and fixing, but it does not provide specific hyperparameters or system-level training settings for the underlying machine learning models.