reproducibilityindex.ai

Emergence of Separable Manifolds in Deep Language Representations

Authors: Jonathan Mamou, Hang Le, Miguel Del Rio, Cory Stephenson, Hanlin Tang, Yoon Kim, Sueyeon Chung

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore representations from different model families (BERT, Ro BERT a, GPT, etc.) and find evidence for emergence of linguistic manifolds across layer depth (e.g., manifolds for part-of-speech tags), especially in ambiguous data (i.e, words with multiple part-of-speech tags, or part-of-speech classes including many words). In addition, we find that the emergence of linear separability in these manifolds is driven by a combined reduction of manifolds radius, dimensionality and inter-manifold correlations.
Researcher Affiliation	Collaboration	1Intel Labs 2Massachusetts Institute of Technology 3Harvard University 4Columbia University.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/ schung039/contextual-repr-manifolds.
Open Datasets	Yes	We use the Penn Treebank (PTB) (Marcus et al., 1993) and select 80 word manifolds based on most frequent words in the corpus. ... We use the semantic tagging (sem-tag) dataset by Abzianidze & Bos (2017)... We use the tags from the Ontonotes dataset (Weischedel et al., 2011).
Dataset Splits	Yes	With a train/test split of 10/90, the fraction of positive ﬁelds (i.e. accuracy) decreases across layers (Fig. 6, Top Left Inset). On the other hand, when we use the same train/test split of 80/20 used by Liu et al. (2019a), we recover their observation that the fraction of positive ﬁelds increases across the layers.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies	No	The paper mentions models like BERT, RoBERTa, GPT, etc., but does not provide specific version numbers for any software dependencies or libraries used for implementation.
Experiment Setup	No	The paper mentions model architectures (e.g., '12-layer transformer' and 'hidden size of 768') but does not provide specific details about the experimental setup such as hyperparameters (learning rate, batch size, optimizer settings, etc.) for training or fine-tuning.