Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Causally Disentangled Representations for Fair Personality Detection

Authors: Yangfu Zhu, Meiling Li, Yuting Wei, Di Liu, Yuqing Li , Bin Wu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on three real-world datasets demonstrate that our IPDN outperforms state-of-the-art methods in personality detection.
Researcher Affiliation Academia 1 College of Information Engineering, Capital Normal University, Beijing, China 2 Beijing University of Posts and Telecommunications, Beijing, China EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations and mathematical equations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes Kaggle dataset is collected from Personality Cafe platform... Pandora is consist of Reddit posts... [Gjurkovi c et al., 2021]. Essays is a well-known stream-of-consciousness dataset [Pennebaker and King, 1999]
Dataset Splits Yes Following previous works [Yang et al., 2021b; Yang et al., 2023b], these datasets are randomly divided into 6:2:2 for training, validation, and testing respectively.
Hardware Specification Yes We implement our IPDN in Pytorch 935 1.11.0 and train it on three NVIDIA Ge Force RTX 2080 GPUs.
Software Dependencies Yes We implement our IPDN in Pytorch 935 1.11.0 and train it on three NVIDIA Ge Force RTX 2080 GPUs. We utilized the Adam optimizer (Kingma et al., 2017)
Experiment Setup Yes We utilized the Adam optimizer (Kingma et al., 2017) and searched for the learning rate among {1e 2, 1e 3, 1e 4}. IPDN are trained for 80 and 120 epochs in single-dataset and cross-dataset experiments, respectively. Early stopping strategy is employed for training. To initialize the post embeddings, we used the pre-trained language model BERT with the bert-base-cased architecture. The output dimensions of the mapping function are set to 200, 200, and 300 for the Kaggle, Pandora, and Essays dataset. The dimensions of the confounder prototype are the same as the output dimension of the mapping function, which is to facilitate feature-level computation. The size K of confounder dictionary C = [c1, c2, ...,c K] (i.e., the number of clusters) are set to 64, 128, and 64 for the three datasets, respectively. We search for the trade-off parameter λ are searched in (0, 1) for different datasets.