Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition
Authors: Yi Zhang, Mingyuan Chen, Jundong Shen, Chongjun Wang9100-9108
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we conduct experiments on the benchmark MMER dataset CMU-MOSEI in both aligned and unaligned settings, which demonstrate the superiority of TAILOR over the state-of-the-arts.In this section, we give empirically evaluations and analysis of our proposed TAILOR |
| Researcher Affiliation | Academia | State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {njuzhangy, mychen, jdshen}@smail.nju.edu.cn, {chjwang}@nju.edu.cn |
| Pseudocode | No | The paper includes mathematical equations, but it does not present any pseudocode or algorithm blocks with structured steps formatted like code. |
| Open Source Code | Yes | 2https://github.com/kniter1/TAILOR |
| Open Datasets | Yes | We conduct experiments on benchmark multimodal multi-label dataset CMU-MOSEI (Zadeh et al. 2018c) |
| Dataset Splits | No | Table 1 summarizes details of CMU-MOSEI in both word-aligned and unaligned settings. While the paper mentions using CMU-MOSEI, a benchmark dataset, it does not explicitly provide the specific training/validation/test splits (e.g., percentages or sample counts) used for reproducibility. It only lists modality dimensions and sequence lengths. |
| Hardware Specification | Yes | All experiments are running with one GTX 1080Ti GPU. |
| Software Dependencies | No | The paper mentions that parameters are optimized by Adam (Kingma and Ba 2015), but it does not provide specific version numbers for any software components, libraries, or programming languages used. |
| Experiment Setup | Yes | We set hyper-parameters α = 0.01, β = 5e 6 and γ = 0.5. The batch size is 64. For layer number in Transformer Encoder, we set nv = na = 4, nt = 6 in uni-modal encoders, nc = 3 in cross-modal encoders. The size of hidden layers in encoders and decoder is d = 256, the head number hl = hm = 8. All parameters in TAILOR are optimized by Adam (Kingma and Ba 2015) with an initial learning rate of 1e 5 for aligned setting, 1e 4 for unaligned setting and employ a liner decay learning rate schedule with a warm-up strategy. |