reproducibilityindex.ai

Zipfian Whitening

Authors: Sho Yokoi, Han Bao, Hiroto Kurita, Hidetoshi Shimodaira

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation: We confirm the effectiveness of Zipfian whitening (Algorithm 1) by measuring performance on standard sentence-level downstream tasks using post-processed word vectors. We employed the most standard word embeddings Glo Ve [43], word2vec [37], and fast Text [11] and utilized the widely adopted evaluation tasks, including STS-B [15] and related benchmarks.
Researcher Affiliation	Academia	Sho Yokoi Tohoku University / RIKEN yokoi@tohoku.ac.jp Han Bao Kyoto University bao@i.kyoto-u.ac.jp Hiroto Kurita Tohoku University hiroto.kurita@dc.tohoku.ac.jp Hidetoshi Shimodaira Kyoto University / RIKEN shimo@i.kyoto-u.ac.jp
Pseudocode	Yes	The specific algorithm is as shown in Algorithm 1. Algorithm 1 Zipfian whitening; a post-processing algorithm on word embeddings.
Open Source Code	Yes	https://github.com/cl-tohoku/zipfian-whitening
Open Datasets	Yes	We employed the most standard word embeddings Glo Ve [43], word2vec [37], and fast Text [11] and utilized the widely adopted evaluation tasks, including STS-B [15] and related benchmarks.
Dataset Splits	Yes	We used the MTEB [40] implementation: https://github.com/embeddings-benchmark/mteb, for the evaluation of the static word embeddings in Table 2, Table 8, and Table 9. For the evaluation of the dynamic word embeddings in Table 5 and Table 12, we used the implementation in Sim CSE paper [22]: https://github.com/princeton-nlp/Sim CSE, to match the experimental setting.
Hardware Specification	Yes	We conducted all experiments using a single NVIDIA RTX 6000 Ada GPU with 48GB VRAM.
Software Dependencies	No	The paper mentions software tools like NLTK, MTEB, and Sim CSE's implementation, but does not provide specific version numbers for these or other key software components used in their experiments.
Experiment Setup	Yes	We followed the hyperparameter choices of the original papers, with the dimensionality reduction parameter for ABTT set to D := 3, and the weighting parameter for SIF set to a := 10 3.