reproducibilityindex.ai

Getting in Shape: Word Embedding SubSpaces

Authors: Tianyuan Zhou, João Sedoc, Jordan Rodu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we provide some theoretical underpinnings for the phenomena observed in our experiments. Lemma 1 shows that when a source representation is aligned to a target representation using a linear transformation, the column space of the aligned representation is determined by the column space of the source, but the singular value structure of the source is entirely discarded. Theorem 1 guarantees the existence of a lower-bounded singular value. (...) In this section, we show some empirical results of word representation alignment. Our key ﬁnding suggests that isotropy is important to successful alignment.
Researcher Affiliation	Academia	Tianyuan Zhou1 , Jo ao Sedoc2 and Jordan Rodu1 1Department of Statistics, University of Virginia 2Department of Computer and Information Science, University of Pennsylvania tz8hu@virginia.edu, joao@cis.upenn.edu, jsr6q@virginia.edu
Pseudocode	No	No pseudocode or algorithm blocks are present.
Open Source Code	Yes	Link of Supplementary Materials and Source codes: https:// github.com/Noah Zhou Tianyuan/Conceptor On Nondist Embedding
Open Datasets	Yes	We perform multiple experiments using distributional word representations (each 300-dimensional) including word2vec [Mikolov et al., 2013b] (Google News), Glo Ve [Pennington et al., 2014] (840 billion Common Crawl) and Fast Text [Bojanowski et al., 2017] (Common Crawl without subword), as our source embeddings, and align them through linear regression to various target representations. We then test the aligned word vectors on seven similarity tasks [Faruqui and Dyer, 2014], and in some cases an additional three concept categorization tasks as supplement.
Dataset Splits	No	No specific training/validation/test split percentages or counts are provided for the datasets used in the experiments.
Hardware Specification	No	No explicit hardware specifications (e.g., GPU/CPU models, memory details) are mentioned for the experimental setup.
Software Dependencies	No	The paper mentions software like word2vec, GloVe, and FastText, but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup	No	The paper describes the word representations used (e.g., 'each 300-dimensional') and the alignment method ('linear regression') but does not provide specific hyperparameters such as learning rates, batch sizes, or optimizer settings for the experimental setup.