Probing BERT in Hyperbolic Spaces
Authors: Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our probes on BERT, a typical contextualized embedding model. In a syntactic subspace, our probe better recovers tree structures than Euclidean probes, revealing the possibility that the geometry of BERT syntax may not necessarily be Euclidean. In a sentiment subspace, we reveal two possible meta-embeddings for positive and negative sentiments and show how lexically-controlled contextualization would change the geometric localization of embeddings. We demonstrate the findings with our Poincaré probe via extensive experiments and visualization. |
| Researcher Affiliation | Collaboration | 1Alibaba Group 2University of Edinburgh 3Beijing Jiaotong University |
| Pseudocode | No | The paper describes the mathematical formulations and architecture of the Poincaré probe but does not include any pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our results can be reproduced at https://github.com/Franx Yao/Poincare Probe. |
| Open Datasets | Yes | Specifically, we use the Penn Treebank dataset (Marcus & Marcinkiewicz, 1993) and reuse the data processing code in Hewitt & Manning (2019) to convert the data format to Stanford Dependency (de Marneffe et al., 2006). [...] We use the Movie Review dataset (Pang & Lee, 2005) with simple binary labels (positive and negative). Details of this dataset are in Appendix B. |
| Dataset Splits | Yes | Since there is no official split of this dataset, we randomly split 10% as dev and test set separately. The statistics can be found in Table 6. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using the 'Adam' optimizer (Kingma & Ba, 2014) but does not specify versions of programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow) that would be needed to reproduce the experiment. |
| Experiment Setup | Yes | For optimization, we use the Adam (Kingma & Ba, 2014) initialized at learning rate 0.001 and train up to 40 epochs. We decay the learning rate and perform model selection based on the dev loss. |