reproducibilityindex.ai

Isotropy in the Contextual Embedding Space: Clusters and Manifolds

Authors: Xingyu Cai, Jiaji Huang, Yuchen Bian, Kenneth Church

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we argue that the isotropy indeed exists in the space, from a different but more constructive perspective. We identify isolated clusters and low dimensional manifolds in the contextual embedding space, and introduce tools to both qualitatively and quantitatively analyze them. We hope the study in this paper could provide insights towards a better understanding of the deep language models. We use Penn Tree Bank (PTB) (Marcus et al., 1993) and Wiki Text-2 (Merity et al., 2016) datasets. The PTB has 0.88 million words and Wiki Text-2 has 2 million. Both of them are the standard datasets for language models. In the rest of the paper, we report on PTB since we see similar results with both datasets. Figure 1 shows strong anisotropy effects in a number of models. These ﬁndings are consistent with Ethayarajh (2019), though we use slightly different metrics. The plots show expected cosine (Sinter and Sintra) as a function of layer.
Researcher Affiliation	Industry	Xingyu Cai, Jiaji Huang, Yuchen Bian, Kenneth Church Baidu Research, 1195 Bordeaux Dr, Sunnyvale, CA 94089, USA {xingyucai,huangjiaji,yuchenbian,kennethchurch}@baidu.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for this paper could be found at https://github.com/Tide Dancer/Isotropy Contxt.
Open Datasets	Yes	We use Penn Tree Bank (PTB) (Marcus et al., 1993) and Wiki Text-2 (Merity et al., 2016) datasets.
Dataset Splits	No	The paper mentions using Penn Tree Bank (PTB) and Wiki Text-2 datasets, which are standard, but does not explicitly state the train/validation/test splits used for their experiments. It only mentions using '20,000 sample vectors' for Silhouette score estimation, which is a sampling for analysis, not a dataset split.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies	No	The paper mentions software like 'Huggingface' and 'Allen NLP' for pre-trained models, 'scikit-learn' for K-Means, and 'FAISS' for K-NN, but it does not specify version numbers for any of these software dependencies.
Experiment Setup	No	The paper describes settings for its analysis, such as the models and datasets used, and parameters for analysis tools (e.g., 'We set K = 100' for LID estimation), but it does not provide typical experimental setup details like hyperparameter values (learning rate, batch size, epochs), optimizer settings, or model initialization specifics, as the paper analyzes pre-trained models rather than training new ones.