Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation

Authors: Jun Yu, Keda Lu, Ji Zhao, Zhihong Wei, Iek-Heng Chu, Peng Chang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally demonstrate that our proposed dialogue cross-enhanced CEAM is more effective compared to existing methods. First, we introduce the dataset and evaluation metrics. Then, we present the experimental setup and the main results. Finally, we conduct ablation studies to analyze the necessity of each component in the architecture.
Researcher Affiliation Collaboration Jun Yu1,2 , Keda Lu1,3 , Ji Zhao1 , Zhihong Wei1 , Iek-Heng Chu4 and Peng Chang4 1University of Science and Technology of China 2Jianghuai Advance Technology Center 3Ping An Technology Co., Ltd, China 4PAII Inc. harryjun@ustc.edu.cn, {lukeda, jzhao tco, weizh588}@mail.ustc.edu.cn, {zhuyixing276, changpeng805}@paii-labs.com
Pseudocode No The paper includes mathematical formulations and architectural diagrams (e.g., Figure 3), but it does not contain formal pseudocode blocks or algorithms.
Open Source Code Yes Our source codes and model checkpoints are available at https://github.com/wujiekd/Dialogue Cross-Enhanced-CEAM.
Open Datasets Yes The NOXI for Engagement Estimation dataset was obtained by M uller et al. [2023] using the published NOvice e Xpert Interaction database (NOXI) [Cafaro et al., 2017] for re-labeling.
Dataset Splits Yes The dataset, which is currently the longest recorded and the only dataset with continuous annotated engagement scores, is divided into a training and validation set.
Hardware Specification Yes We train all our models for 100 epochs on 1 Nvidia V100 GPU with a batch size of 32.
Software Dependencies No The paper mentions using a 'Reduce Learning Rate On Plateau algorithm' and an 'Adam optimizer' for training, but it does not specify version numbers for general software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup Yes The core length is set to 32 with an extended window length of 32... The SA block comprises MSA with 8 heads... The FFN of the SA encoder consists of 2 linear layers with dimensions of 768 4 and 768, respectively... When using the dialogue cross-enhanced module, we set N = 1, M = 1, K = 2... the block skip connection coefficient α is set to 0.5... We train all our models for 100 epochs on 1 Nvidia V100 GPU with a batch size of 32... Other setups include a learning rate scheduler, specifically utilizing the Reduce Learning Rate On Plateau algorithm, with a reduction factor of 0.5 and a patience of 10 epochs. Additionally, we use an Adam optimizer with a learning rate of 1e 3 and our proposed center MSE loss function with the β of 0.5.