WikiWrite: Generating Wikipedia Articles Automatically
Authors: Siddhartha Banerjee, Prasenjit Mitra
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our technique is able to reconstruct existing articles in Wikipedia with high accuracies. We also create several articles using our approach in the English Wikipedia, most of which have been retained in the online encyclopedia. |
| Researcher Affiliation | Academia | Siddhartha Banerjee The Pennsylvania State University University Park, PA, USA sbanerjee@ist.psu.edu Prasenjit Mitra Qatar Computing Research Institute HBKU, Doha, Qatar pmitra@ist.psu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We used the dump dated 2nd June, 2015 in our experiments. The total size of the corpus (only article contents) was close to 50 GB ( 12 GB compressed). Our corpus finally contained 4.8 million articles. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits (percentages, counts, or explicit standard splits) for its primary dataset or the sets used to train the classifiers. |
| Hardware Specification | Yes | We run all experiments on a computer with i7 processor and 16 GB RAM. |
| Software Dependencies | No | The paper mentions using 'the gensim doc2vec package [ ˇReh uˇrek et al., 2011]' but does not provide a specific version number for this package or any other software dependencies. |
| Experiment Setup | Yes | We use 100 dimensional vector representations (parameter D) for the entities and paragraphs of text. Lmax, the maximum number of words in each section is dynamically set to the average number of words in the sections that were clustered together using RBR(see 3.2). For the classification task (assigning content into relevant sections in the article), we experimented with several machine learning classifiers (Random Forest, Naive-Bayes and Support Vector Machines). Random Forest (RF) [Breiman, 2001] performed the best in our classification task on existing Wikipedia articles and hence we report only the results using RF. |