On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems

Authors: Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The evaluation of the captioning system with our methodology uses an Evaluation dataset of 1000 images randomly selected from the MSCOCO validation dataset. All experiments were performed on Amazon Mechanical Turk. We report the system quality based on human assessments.
Researcher Affiliation Collaboration 1ETH Zurich, Department of Computer Science, Switzerland 2Microsoft Research, Redmond, WA, USA
Pseudocode No The paper describes the proposed methodology in text and diagrams but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets Yes All components are individually trained on the MSCOCO dataset (Lin et al. 2014) which was built as an image captioning training set and benchmark.
Dataset Splits Yes We use images randomly sampled from the validation dataset to evaluate our approach. All components are individually trained on the MSCOCO dataset (Lin et al. 2014) which was built as an image captioning training set and benchmark.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies No The paper does not specify any software names with version numbers, such as programming languages, libraries, or frameworks used for the experiments.
Experiment Setup No The paper describes the methodology setup and troubleshooting steps for human-in-the-loop evaluation and fixing, but it does not provide specific hyperparameters or system-level training settings for the underlying machine learning models.