reproducibilityindex.ai

WikiWrite: Generating Wikipedia Articles Automatically

Authors: Siddhartha Banerjee, Prasenjit Mitra

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our technique is able to reconstruct existing articles in Wikipedia with high accuracies. We also create several articles using our approach in the English Wikipedia, most of which have been retained in the online encyclopedia.
Researcher Affiliation	Academia	Siddhartha Banerjee The Pennsylvania State University University Park, PA, USA sbanerjee@ist.psu.edu Prasenjit Mitra Qatar Computing Research Institute HBKU, Doha, Qatar pmitra@ist.psu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We used the dump dated 2nd June, 2015 in our experiments. The total size of the corpus (only article contents) was close to 50 GB ( 12 GB compressed). Our corpus ﬁnally contained 4.8 million articles.
Dataset Splits	No	The paper does not provide specific train/validation/test dataset splits (percentages, counts, or explicit standard splits) for its primary dataset or the sets used to train the classifiers.
Hardware Specification	Yes	We run all experiments on a computer with i7 processor and 16 GB RAM.
Software Dependencies	No	The paper mentions using 'the gensim doc2vec package [ ˇReh uˇrek et al., 2011]' but does not provide a specific version number for this package or any other software dependencies.
Experiment Setup	Yes	We use 100 dimensional vector representations (parameter D) for the entities and paragraphs of text. Lmax, the maximum number of words in each section is dynamically set to the average number of words in the sections that were clustered together using RBR(see 3.2). For the classiﬁcation task (assigning content into relevant sections in the article), we experimented with several machine learning classiﬁers (Random Forest, Naive-Bayes and Support Vector Machines). Random Forest (RF) [Breiman, 2001] performed the best in our classiﬁcation task on existing Wikipedia articles and hence we report only the results using RF.