Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning

Authors: Dengming Zhang, Weitao You, Ziheng Liu, Lingyun Sun, Pei Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our objective and subjective experiments demonstrate that our method can achieve state-of-the-art performance in both traditional DMER and PDMER. Objective experiments demonstrate that our method achieves the best performance in both traditional DEMR and PDMER. Subjective experiments also show that our method better conforms to individual personalized emotional perception in the real world. In the ablation study, we evaluated the effectiveness of each component of our model.
Researcher Affiliation Academia 1School of Software Technology, Zhejiang University, China 2College of Computer Science and Technology, Zhejiang University, China
Pseudocode No The paper describes the model architecture and personalized strategy in detail with equations and figures, but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Code & Case https://littleor.github.io/PDMER
Open Datasets Yes The performance of DSAML is evaluated using two publicly available DMER datasets, both of which provide V-A value annotations every 0.5 seconds, with all unstable annotations from the beginning to 15 seconds removed. The first dataset is the DEAM dataset (Aljanaki, Yang, and Soleymani 2017)... The second dataset is the PMEmo dataset (Zhang et al. 2018)
Dataset Splits Yes In our experiments, we use the 58 full-length songs as the test set, with the remaining 1744 songs as the training set. Notably, 744 of the 45-second clips do not have annotator IDs, meaning we cannot determine the annotators for these songs. Therefore, in our proposed personalized task construction strategy, we only use 1000 songs as the training set. We discard 122 samples with song lengths less than 25 seconds and use 40 songs longer than 65 seconds as the test set, with the remaining 632 songs as the training set. During training, only one sample is used for fast adaptation (i.e., both Si and Sp only contain 1 sample), and 15 samples are used for evaluation (i.e., Qi contains 15 samples).
Hardware Specification Yes We train the model for 2000 episodes on a single NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies No The Adam optimizer (Kingma and Ba 2014) is employed with a learning rate of 0.00005. This mentions an optimizer but not specific software libraries or platforms with version numbers.
Experiment Setup Yes The resolution of DSAML is 2Hz, indicating there is one label every 0.5 seconds. The model architecture consists of 3 layers of Transformer, with a mask context length of nl = 5 and ng = 30. In attention loss, α = 0.5 and β = 0.05. During training, only one sample is used for fast adaptation (i.e., both Si and Sp only contain 1 sample), and 15 samples are used for evaluation (i.e., Qi contains 15 samples). The Adam optimizer (Kingma and Ba 2014) is employed with a learning rate of 0.00005. We train the model for 2000 episodes on a single NVIDIA Ge Force RTX 4090 GPU.