Effective Human-AI Teams via Learned Natural Language Rules and Onboarding
Authors: Hussein Mozannar, Jimin Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through user studies on object detection and question-answering tasks, we show that our method can lead to more accurate human-AI teams. We also evaluate our region discovery and description algorithms separately. |
| Researcher Affiliation | Collaboration | 1MIT-IBM Watson AI Lab, Cambridge, MA 2CSAIL and IMES, Massachusetts Institute of Technology, Cambridge, MA 3IBM Research, Cambridge, MA |
| Pseudocode | Yes | Algorithm 1 Integr AI-Describe Input: Dataset D, region Nk |
| Open Source Code | Yes | Code is available in https://github.com/clinicalml/onboarding_human_ai. |
| Open Datasets | Yes | The image datasets include Berkeley Deep Drive (BDD) [83] where the task is to detect the presence of traffic lights in noisy images... and the validation set of MS-COCO (5k) where the task whether a person is present in the image [48]. The text-based validation datasets comprise of Massive Multi-task Language Understanding (MMLU) [33], and Dynamic Sentiment Analysis Dataset (Dyna Sent) [61]. |
| Dataset Splits | No | Each dataset is split into 70-30 ratio for training and testing five different times so as to obtain error bars of predictions. |
| Hardware Specification | Yes | All experiments are run on a Ge Force GTX 1080 Ti. |
| Software Dependencies | No | The paper mentions specific models and libraries used (e.g., 'flan-t5 model', 'ro BERTa-base model', 'sentence transformer', 'CLIP'), but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | For our method, we set βu = 0.5, βl = 0.01, α = 0.0 for Aim 1 and βu = 0.1, βl = 0.01, α = 0.5 for Aim 2 and random prior decisions (50-50 for 0 and 1). |