Effective Human-AI Teams via Learned Natural Language Rules and Onboarding

Authors: Hussein Mozannar, Jimin Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through user studies on object detection and question-answering tasks, we show that our method can lead to more accurate human-AI teams. We also evaluate our region discovery and description algorithms separately.
Researcher Affiliation Collaboration 1MIT-IBM Watson AI Lab, Cambridge, MA 2CSAIL and IMES, Massachusetts Institute of Technology, Cambridge, MA 3IBM Research, Cambridge, MA
Pseudocode Yes Algorithm 1 Integr AI-Describe Input: Dataset D, region Nk
Open Source Code Yes Code is available in https://github.com/clinicalml/onboarding_human_ai.
Open Datasets Yes The image datasets include Berkeley Deep Drive (BDD) [83] where the task is to detect the presence of traffic lights in noisy images... and the validation set of MS-COCO (5k) where the task whether a person is present in the image [48]. The text-based validation datasets comprise of Massive Multi-task Language Understanding (MMLU) [33], and Dynamic Sentiment Analysis Dataset (Dyna Sent) [61].
Dataset Splits No Each dataset is split into 70-30 ratio for training and testing five different times so as to obtain error bars of predictions.
Hardware Specification Yes All experiments are run on a Ge Force GTX 1080 Ti.
Software Dependencies No The paper mentions specific models and libraries used (e.g., 'flan-t5 model', 'ro BERTa-base model', 'sentence transformer', 'CLIP'), but it does not specify version numbers for any software dependencies.
Experiment Setup Yes For our method, we set βu = 0.5, βl = 0.01, α = 0.0 for Aim 1 and βu = 0.1, βl = 0.01, α = 0.5 for Aim 2 and random prior decisions (50-50 for 0 and 1).