Probing BERT in Hyperbolic Spaces

Authors: Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincaré probe via extensive experiments and visualization.
Researcher Affiliation Collaboration 1Alibaba Group 2University of Edinburgh 3Beijing Jiaotong University
Pseudocode No The paper describes the mathematical formulations and architecture of the Poincaré probe but does not include any pseudocode blocks or algorithms.
Open Source Code Yes Our results can be reproduced at https://github.com/Franx Yao/Poincare Probe.
Open Datasets Yes Specifically, we use the Penn Treebank dataset (Marcus & Marcinkiewicz, 1993) and reuse the data processing code in Hewitt & Manning (2019) to convert the data format to Stanford Dependency (de Marneffe et al., 2006). [...] We use the Movie Review dataset (Pang & Lee, 2005) with simple binary labels (positive and negative). Details of this dataset are in Appendix B.
Dataset Splits Yes Since there is no official split of this dataset, we randomly split 10% as dev and test set separately. The statistics can be found in Table 6.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the 'Adam' optimizer (Kingma & Ba, 2014) but does not specify versions of programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow) that would be needed to reproduce the experiment.
Experiment Setup Yes For optimization, we use the Adam (Kingma & Ba, 2014) initialized at learning rate 0.001 and train up to 40 epochs. We decay the learning rate and perform model selection based on the dev loss.