reproducibilityindex.ai

Effective Human-AI Teams via Learned Natural Language Rules and Onboarding

Authors: Hussein Mozannar, Jimin Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through user studies on object detection and question-answering tasks, we show that our method can lead to more accurate human-AI teams. We also evaluate our region discovery and description algorithms separately.
Researcher Affiliation	Collaboration	1MIT-IBM Watson AI Lab, Cambridge, MA 2CSAIL and IMES, Massachusetts Institute of Technology, Cambridge, MA 3IBM Research, Cambridge, MA
Pseudocode	Yes	Algorithm 1 Integr AI-Describe Input: Dataset D, region Nk
Open Source Code	Yes	Code is available in https://github.com/clinicalml/onboarding_human_ai.
Open Datasets	Yes	The image datasets include Berkeley Deep Drive (BDD) [83] where the task is to detect the presence of traffic lights in noisy images... and the validation set of MS-COCO (5k) where the task whether a person is present in the image [48]. The text-based validation datasets comprise of Massive Multi-task Language Understanding (MMLU) [33], and Dynamic Sentiment Analysis Dataset (Dyna Sent) [61].
Dataset Splits	No	Each dataset is split into 70-30 ratio for training and testing five different times so as to obtain error bars of predictions.
Hardware Specification	Yes	All experiments are run on a Ge Force GTX 1080 Ti.
Software Dependencies	No	The paper mentions specific models and libraries used (e.g., 'flan-t5 model', 'ro BERTa-base model', 'sentence transformer', 'CLIP'), but it does not specify version numbers for any software dependencies.
Experiment Setup	Yes	For our method, we set βu = 0.5, βl = 0.01, α = 0.0 for Aim 1 and βu = 0.1, βl = 0.01, α = 0.5 for Aim 2 and random prior decisions (50-50 for 0 and 1).