reproducibilityindex.ai

Learning to Navigate Wikipedia by Taking Random Walks

Authors: Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments
Researcher Affiliation	Industry	Manzil Zaheer , Kenneth Marino , Will Grathwohl , John Schultz , Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus Deep Mind New York {manzilzaheer, kmarino, wgrathwohl, jhtschultz, wendyshang, sbabayan, arahuja, idg, christinech, robfergus}@google.com
Pseudocode	No	The paper describes the methods in prose and equations (e.g., Section 3 and 3.1) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The paper and supplemental provide full details on how to reproduce the simple approach.
Open Datasets	Yes	The particular web environment we consider here is Wikipedia, converted into a graph form. Each of the 38M paragraphs is represented by a node; edges are links within and between articles. We consider two snapshots from year 2017 and 2018. ... Finally, we use our navigation approach to gather information for the tasks of fact verification on the FEVER benchmark [Thorne et al., 2018] and question answering on Natural Questions (NQ) [Lee et al., 2019] which also use Wikipedia as a knowledge base.
Dataset Splits	Yes	For initial experiments, we also use a smaller subsampled graph, for which we sub-sample disjoint sets of 200k train / 200k evaluation nodes from the 2018 graph.
Hardware Specification	No	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] The experiments were run on a proprietary infrastructure on which it is hard to estimate the total amount of compute.
Software Dependencies	No	The paper mentions models like RoBERTA, DistilBERT, and Mini BERT, and optimizers like Adam, but does not provide specific version numbers for any software libraries or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We use a training batch size of 2048, 50k warmup steps and 500k training steps, and Adam optimizer [Kingma and Ba, 2014] with learning rate 5e-4. ... The agent is then given a time budget B (in most experiments, we fix max steps to B = 100)...