The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual Emotion Prompt can boost the performance of AI models while Emotion Attack can hinder it.
Researcher Affiliation Collaboration 1Microsoft Research, Beijing, China 2CAS, Institute of Software, Beijing, China 3Department of Computer Science, William & Mary, Williamsburg, Virginia, America 4School of Psychology, Beijing Normal University, Beijing, China 5Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links about open-sourcing the code for its described methodology.
Open Datasets Yes Specifically, we adopted 50 tasks from two popular datasets, including Instruction Induction (Honovich et al., 2022) and BIG-Bench-Hard (Suzgun et al., 2022) to evaluate semantic understanding and logical reasoning abilities, leading to 940, 200 evaluations.
Dataset Splits No The paper mentions using specific datasets for evaluation but does not provide explicit details about training, validation, and test splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper does not specify any particular hardware components such as specific GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions the use of large language models and multi-modal models (e.g., GPT-4, Llama2, DALL-E) but does not list specific software dependencies, programming languages, or libraries with version numbers used for their own implementation.
Experiment Setup Yes For Chat GPT, we utilize gpt-3.5-turbo (0613) and set temperature parameter to 0.7. For GPT-4 and Llama 2, we set the temperature to 0.7. The remaining LLMs are evaluated using their default settings.