Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders

Authors: Jinghui Qin, Changsong Liu, Tianchi Tang, Dahuang Liu, Minghao Wang, Qianying Huang, Rumin Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on MMPsy and the DAIC-WOZ dataset demonstrate the effectiveness of Mental-Perceiver in anxiety and depression detection.
Researcher Affiliation Collaboration Jinghui Qin1, Changsong Liu2,3, Tianchi Tang2, Dahuang Liu2, Minghao Wang2, Qianying Huang2, Rumin Zhang2,4* 1Guangdong University of Technology 2Guangdong Shuye Intelligent Technology Co., Ltd. 3University of Toronto 4Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
Pseudocode No The paper describes the architecture of Mental-Perceiver using textual descriptions, mathematical equations, and an illustration (Figure 1), but it does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper provides a link for 'Datasets https://github.com/shuyeit/mmpsy-data', which explicitly refers to data. There is no clear statement or link provided for the open-sourcing of the Mental-Perceiver model's code.
Open Datasets Yes To address this, we introduce the Multi-Modal Psychological assessment corpus (MMPsy), a large-scale dataset... Datasets https://github.com/shuyeit/mmpsy-data
Dataset Splits Yes These data were randomly partitioned into training, validation, and test sets using an 8:1:1 ratio. Consequently, the anxiety detection subset comprises 6,188 training, 774 validation, and 774 test participants, while the depression detection subset contains 3,397 training, 425 validation, and 425 test participants.
Hardware Specification Yes We use Pytorch3 to implement our framework on Linux with two NVIDIA RTX 4090 GPU cards.
Software Dependencies No The paper mentions using 'Pytorch', 'Adam W' optimizer, 'Surfboard' for feature extraction, and 'BERT' for text features. However, it does not provide specific version numbers for any of these software dependencies, which is required for reproducibility.
Experiment Setup Yes The feature dimension Dx is set to 768 and other dimensions Dz, Dq, and Dy are all set to 512. ... We trained models for 200 epochs with an initial learning rate of 0.00003 and used Lambda LR to adjust the learning rate during training. The early stopping with patience 15 is deployed to accelerate training.