GlobalTrait: Personality Alignment of Multilingual Word Embeddings
Authors: Farhad Bin Siddique, Dario Bertero, Pascale Fung7015-7022
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, Global Trait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classification results when evaluating our model on a separate Chinese dataset. |
| Researcher Affiliation | Collaboration | Farhad Bin Siddique,1,2 Dario Bertero,1,2 Pascale Fung1,2,3 1Electronic and Computer Engineering Department 2Center for Artificial Intelligence Research (CAi RE) 3EMOS Technologies Inc. The Hong Kong University of Science and Technology Clear Water Bay, Hong Kong |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper only references third-party tools and their GitHub repositories (e.g., fastText, Jieba) but does not provide concrete access to its own source code for the methodology described. |
| Open Datasets | Yes | We used the 2015 Author Profiling challenge dataset (PAN 2015) (Rangel et al. 2015), which includes user tweets in four languages English (en), Spanish (es), Italian (it) and Dutch (nl), where the personality labels were obtained via self-assessment using the BFI-10 item personality questionnaire (Rammstedt and John 2007). Only the training set was released to us from the PAN2015 website 2. https://pan.webis.de/clef15/pan15-web/ author-profiling.html |
| Dataset Splits | Yes | For our results shown, we carried out a stratified k-fold cross validation, by making k=5 splits of the training set into training/validation, and then show the average result across the 5 different validation sets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software tools like 'fastText' and 'Jieba segmenter' but does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | For our MUSE training, we used a discriminator with 2 hidden layers, each having a dimension of 2048, and we ran our training for 5 epochs with 100,000 iterations in each epoch. When training the personality alignment, we took the top 3000 significant words corresponding positively to each trait per language. For our CNN model, we used 64 filters per filter size, and for our fully connected layer, we set the hidden layer dimension to 100, and we ran our training for 100 epochs for each model with batch size = 10. We used binary cross entropy as our loss function, and used Adam optimizer with learning rate of 1e-4. |