reproducibilityindex.ai

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Authors: Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments involving language and multi-modal models on semantic understanding, logical reasoning, and generation tasks, we demonstrate that both textual and visual Emotion Prompt can boost the performance of AI models while Emotion Attack can hinder it.
Researcher Affiliation	Collaboration	1Microsoft Research, Beijing, China 2CAS, Institute of Software, Beijing, China 3Department of Computer Science, William & Mary, Williamsburg, Virginia, America 4School of Psychology, Beijing Normal University, Beijing, China 5Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements or links about open-sourcing the code for its described methodology.
Open Datasets	Yes	Specifically, we adopted 50 tasks from two popular datasets, including Instruction Induction (Honovich et al., 2022) and BIG-Bench-Hard (Suzgun et al., 2022) to evaluate semantic understanding and logical reasoning abilities, leading to 940, 200 evaluations.
Dataset Splits	No	The paper mentions using specific datasets for evaluation but does not provide explicit details about training, validation, and test splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification	No	The paper does not specify any particular hardware components such as specific GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper mentions the use of large language models and multi-modal models (e.g., GPT-4, Llama2, DALL-E) but does not list specific software dependencies, programming languages, or libraries with version numbers used for their own implementation.
Experiment Setup	Yes	For Chat GPT, we utilize gpt-3.5-turbo (0613) and set temperature parameter to 0.7. For GPT-4 and Llama 2, we set the temperature to 0.7. The remaining LLMs are evaluated using their default settings.