Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology

Authors: Valentin Hofmann, Janet Pierrehumbert, Hinrich Schütze

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments suggest that the ideological subspace encodes abstract evaluative semantics and reflects changes in the political left-right spectrum during the presidency of Donald Trump. Table 2: Performance on link prediction (MAUC).
Researcher Affiliation Academia Valentin Hofmann 1 2 Janet B. Pierrehumbert 3 1 Hinrich Sch utze 2 1Faculty of Linguistics, University of Oxford 2Center for Information and Language Processing, LMU Munich 3Department of Engineering Science, University of Oxford.
Pseudocode No The paper describes its method using textual descriptions and mathematical equations but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes We make our code available at https://github.com/valentinhofmann/unsupervised_bias.
Open Datasets Yes We base our study on the Reddit Politosphere (Hofmann et al., 2022b), a dataset covering the political discourse on the social media platform Reddit from 2008 to 2019.
Dataset Splits Yes We split concepts and edges for each year into train (60%), dev (20%), and test (20%).
Hardware Specification Yes Experiments are performed on a Ge Force GTX 1080 Ti GPU (11GB).
Software Dependencies No The paper mentions pretrained BERT and Adam optimizer, but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup Yes We perform grid search for the learning rate r {1 10 4, 3 10 4, 1 10 3}. For the model used to find X , we further perform grid search for the orthogonality constant λo {1 10 3, 3 10 3, 1 10 2} as well as the sparsity constant λs {1 10 2, 3 10 2, 1 10 1}. In total, there are 3 hyperparameter search trials for X and 27 for X per year. We use Adam (Kingma & Ba, 2015) as the optimizer. Both hidden layers of the graph auto-encoder have 10 dimensions.