reproducibilityindex.ai

Learning to Complement Humans

Authors: Bryan Wilder, Eric Horvitz, Ece Kamar

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate in two real-world domains (scientiﬁc discovery and medical diagnosis) that human-machine teams built via these methods outperform the individual performance of machines and people. We then analyze conditions under which this complementarity is strongest, and which training methods amplify it. We conducted experiments in two real-world domains to explore opportunities for human-machine complementarity and methods to best leverage the complementarity.
Researcher Affiliation	Collaboration	Bryan Wilder1 , Eric Horvitz2 and Ece Kamar2 1 School of Engineering and Applied Sciences, Harvard University 2 Microsoft Research
Pseudocode	Yes	Algorithm 1 Joint VOI training 1: for T iterations do 2: Sample a minibatch B [n] 3: for i B do 4: for ˆy Y do 5: unq(ˆy) = P y Y pα(y\|xi)u(ˆy, y) 6: end for 7: unq = P ˆy Y unq(ˆy) exp(unq(ˆy)) P y Y exp(unq(y )) 8: for ˆy Y do 9: uq(ˆy, h) = P y Y pγ(y\|xi, h)u(ˆy, y) 10: end for 11: uq = P h pβ(h\|x) P ˆy uq(ˆy,h) exp(uq(ˆy,h)) P y Y exp(uq(y ,h)) 12: q = exp(uq) exp(uq)+exp(unq) 13: ℓi combined = ℓ(q pγ( \|xi, hi) 14: +(1 q)pα( \|xi)) + qc 15: end for 16: Backpropagate 1 \|B\| P i B ℓi combined 17: Every t iterations: update calibrators 18: end for
Open Source Code	No	The paper mentions that its implementation is based on [Vekariya, 2016], which provides a GitHub link (https://github.com/arjunvekariyagithub/camelyon16-grand-challenge). However, this is for a third-party competition entry and not an explicit statement that the authors of this paper are releasing their own source code for the methodology described in the paper.
Open Datasets	Yes	We ﬁrst explore a scientiﬁc discovery task from the Galaxy Zoo project. We use 10,000 instances for training and 4,000 for testing. Each instance contains visual features which previous work extracted from the dataset [Lintott et al., 2008; Kamar et al., 2012]. We use data from the CAMELYON16 challenge [Bejnordi et al., 2017].
Dataset Splits	Yes	We use 10,000 instances for training and 4,000 for testing. The dataset consists of 127 images. There are also 270 images without panel responses, with which we pretrain the ML models.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running the experiments. It only mentions the training of convolutional networks.
Software Dependencies	No	The paper mentions software components like "neural networks," "Re LU activations," "dropout," "gradient descent," "Platt method," and "Inception-v3." However, it does not provide specific version numbers for any of these components or underlying libraries (e.g., TensorFlow 2.x, PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	All use neural networks with Re LU activations and dropout (p = 0.2). In our implementation, we use neural networks trained via gradient descent, followed by a sigmoid calibrator trained using the Platt method. Our experiments vary the number of layers and hidden units to examine the impact of model capacity. We update the calibration layer every t steps to maintain wellcalibrated probabilities. We train the model under a range of weightings of classiﬁcation loss vs query cost.