reproducibilityindex.ai

WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia

Authors: Kenichiro Ando, Satoshi Sekine, Mamoru Komachi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, by performing human annotation, we found that the model we developed performed better than the crowdsourced workers.
Researcher Affiliation	Academia	Kenichiro Ando1, Satoshi Sekine1, Mamoru Komachi2 1RIKEN AIP 2Hitotsubashi University
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The dataset and codes used in this work are publicly available 3.https://github.com/ken-ando/Wiki SQE
Open Datasets	Yes	Here, we propose Wiki SQE, the first large-scale dataset for sentence quality estimation in Wikipedia. The dataset and codes used in this work are publicly available 3.https://github.com/ken-ando/Wiki SQE
Dataset Splits	Yes	For the development and test data, we randomly extract 500 positive and 500 negative examples each and concatenate them to make 1000 sentences. The remaining data were used as training data.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided in the paper.
Software Dependencies	No	The paper mentions models like DeBERTa V3, BERT, and RoBERTa, and a tool pySBD, but does not specify version numbers for the underlying software libraries or frameworks used (e.g., PyTorch, TensorFlow, HuggingFace Transformers versions).
Experiment Setup	Yes	The maximum number of training epochs is 20, and the model that records the highest F1 in the development set is used as the best model to predict the test set. The learning rate is determined by searching among 1-e6, 5-e6, 1-e5, and 5-e5. The maximum input sequence length is 256 and the batch size is 64. In all setups, we report the average F1 values of the experiments with three different seeds.