reproducibilityindex.ai

GlobalTrait: Personality Alignment of Multilingual Word Embeddings

Authors: Farhad Bin Siddique, Dario Bertero, Pascale Fung7015-7022

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis shows that words having a similar semantic meaning in different languages do not necessarily correspond to the same personality traits. Therefore, we propose a personality alignment method, Global Trait, which has a mapping for each trait from the source language to the target language (English), such that words that correlate positively to each trait are close together in the multilingual vector space. Using these aligned embeddings for training, we can transfer personality related training features from high-resource languages such as English to other low-resource languages, and get better multilingual results, when compared to using simple monolingual and unaligned multilingual embeddings. We achieve an average F-score increase (across all three languages except English) from 65 to 73.4 (+8.4), when comparing our monolingual model to multilingual using CNN with personality aligned embeddings. We also show relatively good performance in the regression tasks, and better classiﬁcation results when evaluating our model on a separate Chinese dataset.
Researcher Affiliation	Collaboration	Farhad Bin Siddique,1,2 Dario Bertero,1,2 Pascale Fung1,2,3 1Electronic and Computer Engineering Department 2Center for Artiﬁcial Intelligence Research (CAi RE) 3EMOS Technologies Inc. The Hong Kong University of Science and Technology Clear Water Bay, Hong Kong
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper only references third-party tools and their GitHub repositories (e.g., fastText, Jieba) but does not provide concrete access to its own source code for the methodology described.
Open Datasets	Yes	We used the 2015 Author Proﬁling challenge dataset (PAN 2015) (Rangel et al. 2015), which includes user tweets in four languages English (en), Spanish (es), Italian (it) and Dutch (nl), where the personality labels were obtained via self-assessment using the BFI-10 item personality questionnaire (Rammstedt and John 2007). Only the training set was released to us from the PAN2015 website 2. https://pan.webis.de/clef15/pan15-web/ author-profiling.html
Dataset Splits	Yes	For our results shown, we carried out a stratiﬁed k-fold cross validation, by making k=5 splits of the training set into training/validation, and then show the average result across the 5 different validation sets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software tools like 'fastText' and 'Jieba segmenter' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup	Yes	For our MUSE training, we used a discriminator with 2 hidden layers, each having a dimension of 2048, and we ran our training for 5 epochs with 100,000 iterations in each epoch. When training the personality alignment, we took the top 3000 signiﬁcant words corresponding positively to each trait per language. For our CNN model, we used 64 ﬁlters per ﬁlter size, and for our fully connected layer, we set the hidden layer dimension to 100, and we ran our training for 100 epochs for each model with batch size = 10. We used binary cross entropy as our loss function, and used Adam optimizer with learning rate of 1e-4.