Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation
Authors: Jun Yu, Keda Lu, Ji Zhao, Zhihong Wei, Iek-Heng Chu, Peng Chang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we experimentally demonstrate that our proposed dialogue cross-enhanced CEAM is more effective compared to existing methods. First, we introduce the dataset and evaluation metrics. Then, we present the experimental setup and the main results. Finally, we conduct ablation studies to analyze the necessity of each component in the architecture. |
| Researcher Affiliation | Collaboration | Jun Yu1,2 , Keda Lu1,3 , Ji Zhao1 , Zhihong Wei1 , Iek-Heng Chu4 and Peng Chang4 1University of Science and Technology of China 2Jianghuai Advance Technology Center 3Ping An Technology Co., Ltd, China 4PAII Inc. harryjun@ustc.edu.cn, {lukeda, jzhao tco, weizh588}@mail.ustc.edu.cn, {zhuyixing276, changpeng805}@paii-labs.com |
| Pseudocode | No | The paper includes mathematical formulations and architectural diagrams (e.g., Figure 3), but it does not contain formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our source codes and model checkpoints are available at https://github.com/wujiekd/Dialogue Cross-Enhanced-CEAM. |
| Open Datasets | Yes | The NOXI for Engagement Estimation dataset was obtained by M uller et al. [2023] using the published NOvice e Xpert Interaction database (NOXI) [Cafaro et al., 2017] for re-labeling. |
| Dataset Splits | Yes | The dataset, which is currently the longest recorded and the only dataset with continuous annotated engagement scores, is divided into a training and validation set. |
| Hardware Specification | Yes | We train all our models for 100 epochs on 1 Nvidia V100 GPU with a batch size of 32. |
| Software Dependencies | No | The paper mentions using a 'Reduce Learning Rate On Plateau algorithm' and an 'Adam optimizer' for training, but it does not specify version numbers for general software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | The core length is set to 32 with an extended window length of 32... The SA block comprises MSA with 8 heads... The FFN of the SA encoder consists of 2 linear layers with dimensions of 768 4 and 768, respectively... When using the dialogue cross-enhanced module, we set N = 1, M = 1, K = 2... the block skip connection coefficient α is set to 0.5... We train all our models for 100 epochs on 1 Nvidia V100 GPU with a batch size of 32... Other setups include a learning rate scheduler, specifically utilizing the Reduce Learning Rate On Plateau algorithm, with a reduction factor of 0.5 and a patience of 10 epochs. Additionally, we use an Adam optimizer with a learning rate of 1e 3 and our proposed center MSE loss function with the β of 0.5. |