Simple or Complex? Learning to Predict Readability of Bengali Texts

Authors: Susmoy Chakraborty, Mir Tafseer Nayeem, Wasi Uddin Ahmad12621-12629

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use our document level dataset to experiment with formula-based approaches and use the sentence-level dataset to train supervised neural models. ... We present the detailed ablation experiment results of our test set in Table 4.
Researcher Affiliation Academia Susmoy Chakraborty1*, Mir Tafseer Nayeem1*, Wasi Uddin Ahmad2 1Ahsanullah University of Science and Technology 2University of California, Los Angeles
Pseudocode Yes Algorithm 1: Consonant Conjunct Count Algorithm.
Open Source Code Yes We make our code & dataset publicly available at https://github. com/tafseer-nayeem/Bengali Readability for reproduciblity.
Open Datasets Yes We make our code & dataset publicly available at https://github. com/tafseer-nayeem/Bengali Readability for reproduciblity. ... We present several human-annotated corpora and dictionaries such as a document-level dataset comprising 618 documents with 12 different grade levels, a large-scale sentence-level dataset comprising more than 96K sentences with simple and complex labels...
Dataset Splits Yes Table 2: Statistics of the sentence-level dataset. ... Train Dev Test Simple Sentences #Sents 37,902 1,100 1,100
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments.
Software Dependencies No The paper mentions software like the BNLP library and iNLTK library, but does not provide specific version numbers for these or any other ancillary software dependencies required for reproducibility.
Experiment Setup Yes We use 60 as maximum sequence length with a batch size of 16, embedding size of 300, 64 LSTM hidden units, and Adam optimizer (Kingma and Ba 2015) with a learning rate of 0.001. We run the training for 50 epochs and check the improvement of validation (dev set) loss to save the latest best model during training.