Learning to Complement Humans

Authors: Bryan Wilder, Eric Horvitz, Ece Kamar

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate in two real-world domains (scientific discovery and medical diagnosis) that human-machine teams built via these methods outperform the individual performance of machines and people. We then analyze conditions under which this complementarity is strongest, and which training methods amplify it. We conducted experiments in two real-world domains to explore opportunities for human-machine complementarity and methods to best leverage the complementarity.
Researcher Affiliation Collaboration Bryan Wilder1 , Eric Horvitz2 and Ece Kamar2 1 School of Engineering and Applied Sciences, Harvard University 2 Microsoft Research
Pseudocode Yes Algorithm 1 Joint VOI training 1: for T iterations do 2: Sample a minibatch B [n] 3: for i B do 4: for ˆy Y do 5: unq(ˆy) = P y Y pα(y|xi)u(ˆy, y) 6: end for 7: unq = P ˆy Y unq(ˆy) exp(unq(ˆy)) P y Y exp(unq(y )) 8: for ˆy Y do 9: uq(ˆy, h) = P y Y pγ(y|xi, h)u(ˆy, y) 10: end for 11: uq = P h pβ(h|x) P ˆy uq(ˆy,h) exp(uq(ˆy,h)) P y Y exp(uq(y ,h)) 12: q = exp(uq) exp(uq)+exp(unq) 13: ℓi combined = ℓ(q pγ( |xi, hi) 14: +(1 q)pα( |xi)) + qc 15: end for 16: Backpropagate 1 |B| P i B ℓi combined 17: Every t iterations: update calibrators 18: end for
Open Source Code No The paper mentions that its implementation is based on [Vekariya, 2016], which provides a GitHub link (https://github.com/arjunvekariyagithub/camelyon16-grand-challenge). However, this is for a third-party competition entry and not an explicit statement that the authors of this paper are releasing their own source code for the methodology described in the paper.
Open Datasets Yes We first explore a scientific discovery task from the Galaxy Zoo project. We use 10,000 instances for training and 4,000 for testing. Each instance contains visual features which previous work extracted from the dataset [Lintott et al., 2008; Kamar et al., 2012]. We use data from the CAMELYON16 challenge [Bejnordi et al., 2017].
Dataset Splits Yes We use 10,000 instances for training and 4,000 for testing. The dataset consists of 127 images. There are also 270 images without panel responses, with which we pretrain the ML models.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments. It only mentions the training of convolutional networks.
Software Dependencies No The paper mentions software components like "neural networks," "Re LU activations," "dropout," "gradient descent," "Platt method," and "Inception-v3." However, it does not provide specific version numbers for any of these components or underlying libraries (e.g., TensorFlow 2.x, PyTorch 1.x, Python 3.x).
Experiment Setup Yes All use neural networks with Re LU activations and dropout (p = 0.2). In our implementation, we use neural networks trained via gradient descent, followed by a sigmoid calibrator trained using the Platt method. Our experiments vary the number of layers and hidden units to examine the impact of model capacity. We update the calibration layer every t steps to maintain wellcalibrated probabilities. We train the model under a range of weightings of classification loss vs query cost.