reproducibilityindex.ai

Leveraging the Wikipedia Graph for Evaluating Word Embeddings

Authors: Joachim Giesen, Paul Kahlmeyer, Frank Nussbaum, Sina Zarrieß

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the WALES metric along three dimensions, namely, robustness with respect to the involved hyperparameters, comparison to baseline methods, and, of course, correlation with metrics based on collected human similarity judgements from the literature. Here, we use a sub-graph of the English Wikipedia hyperlink graph collected by the Stanford Network Analysis Project (SNAP) [Boldi et al., 2011]. Since WALES is a routing task, we further restrict the graph to the largest strongly connected component of the original graph. The resulting graph has n = 38 609 of the original 4 203 323 nodes, making it easier to handle and preventing impossible routing tasks. We demonstrate WALES on pre-trained word embeddings for Google s Word2Vec [Mikolov et al., 2013], Stanford s Glo Ve [Pennington et al., 2014], Facebook s fasttext [Mikolov et al., 2018], and Allen NLPs ELMo [Peters et al., 2018] (for details and versions see the supplement). As can be seen in Figure 3, different embeddings lead to distinctly different behaviors of the information foraging agents.
Researcher Affiliation	Academia	1Friedrich Schiller University Jena 2DLR Institute of Data Science 3Bielefeld University
Pseudocode	No	The paper describes the agent's decision rule and metric computation but does not provide formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide explicit statements about the release of its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	Here, we use a sub-graph of the English Wikipedia hyperlink graph collected by the Stanford Network Analysis Project (SNAP) [Boldi et al., 2011]. ... We have scraped a human benchmark data set from the The Wiki Game website that is maintained by [Clemesha, 2018]. ... we used shortest paths m(s, t) from the full Wikipedia, which we obtained from the online service by [Wenger, 2018].
Dataset Splits	Yes	A good approximation of the expected value is obtained for k = 1 000 tasks. ... Therefore, from now on, we use task sets of size 1 000. ... We respectively drew 1 000 tasks (start-target pairs of Wikipedia articles) from each distribution.
Hardware Specification	No	The paper does not specify the hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions specific word embedding models but does not provide version numbers for any software dependencies or libraries used for the implementation or experiments.
Experiment Setup	Yes	For γ [0, 1], we define an agent based on the following decision rule for selecting a follow-up node: v = arg max v Ti: deg(v)=0, mi(v)< cos(f(wv), f(wt)) γmi(v). ... For this experiment, we sample nodes dependent on their number of incoming links (in-degree). ... we use a = 1, 2, 4, 8, 16, 32 as hyperparameters. As a second class, we consider uniform distributions on the top b percent of nodes with the largest in-degrees, where b = 100 amounts to the uniform distribution over all tasks.