WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia

Authors: Kenichiro Ando, Satoshi Sekine, Mamoru Komachi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, by performing human annotation, we found that the model we developed performed better than the crowdsourced workers.
Researcher Affiliation Academia Kenichiro Ando1, Satoshi Sekine1, Mamoru Komachi2 1RIKEN AIP 2Hitotsubashi University
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The dataset and codes used in this work are publicly available 3.https://github.com/ken-ando/Wiki SQE
Open Datasets Yes Here, we propose Wiki SQE, the first large-scale dataset for sentence quality estimation in Wikipedia. The dataset and codes used in this work are publicly available 3.https://github.com/ken-ando/Wiki SQE
Dataset Splits Yes For the development and test data, we randomly extract 500 positive and 500 negative examples each and concatenate them to make 1000 sentences. The remaining data were used as training data.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided in the paper.
Software Dependencies No The paper mentions models like DeBERTa V3, BERT, and RoBERTa, and a tool pySBD, but does not specify version numbers for the underlying software libraries or frameworks used (e.g., PyTorch, TensorFlow, HuggingFace Transformers versions).
Experiment Setup Yes The maximum number of training epochs is 20, and the model that records the highest F1 in the development set is used as the best model to predict the test set. The learning rate is determined by searching among 1-e6, 5-e6, 1-e5, and 5-e5. The maximum input sequence length is 256 and the batch size is 64. In all setups, we report the average F1 values of the experiments with three different seeds.