On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems
Authors: Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The evaluation of the captioning system with our methodology uses an Evaluation dataset of 1000 images randomly selected from the MSCOCO validation dataset. All experiments were performed on Amazon Mechanical Turk. We report the system quality based on human assessments. |
| Researcher Affiliation | Collaboration | 1ETH Zurich, Department of Computer Science, Switzerland 2Microsoft Research, Redmond, WA, USA |
| Pseudocode | No | The paper describes the proposed methodology in text and diagrams but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. |
| Open Datasets | Yes | All components are individually trained on the MSCOCO dataset (Lin et al. 2014) which was built as an image captioning training set and benchmark. |
| Dataset Splits | Yes | We use images randomly sampled from the validation dataset to evaluate our approach. All components are individually trained on the MSCOCO dataset (Lin et al. 2014) which was built as an image captioning training set and benchmark. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models. |
| Software Dependencies | No | The paper does not specify any software names with version numbers, such as programming languages, libraries, or frameworks used for the experiments. |
| Experiment Setup | No | The paper describes the methodology setup and troubleshooting steps for human-in-the-loop evaluation and fixing, but it does not provide specific hyperparameters or system-level training settings for the underlying machine learning models. |