Simple or Complex? Learning to Predict Readability of Bengali Texts
Authors: Susmoy Chakraborty, Mir Tafseer Nayeem, Wasi Uddin Ahmad12621-12629
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use our document level dataset to experiment with formula-based approaches and use the sentence-level dataset to train supervised neural models. ... We present the detailed ablation experiment results of our test set in Table 4. |
| Researcher Affiliation | Academia | Susmoy Chakraborty1*, Mir Tafseer Nayeem1*, Wasi Uddin Ahmad2 1Ahsanullah University of Science and Technology 2University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1: Consonant Conjunct Count Algorithm. |
| Open Source Code | Yes | We make our code & dataset publicly available at https://github. com/tafseer-nayeem/Bengali Readability for reproduciblity. |
| Open Datasets | Yes | We make our code & dataset publicly available at https://github. com/tafseer-nayeem/Bengali Readability for reproduciblity. ... We present several human-annotated corpora and dictionaries such as a document-level dataset comprising 618 documents with 12 different grade levels, a large-scale sentence-level dataset comprising more than 96K sentences with simple and complex labels... |
| Dataset Splits | Yes | Table 2: Statistics of the sentence-level dataset. ... Train Dev Test Simple Sentences #Sents 37,902 1,100 1,100 |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments. |
| Software Dependencies | No | The paper mentions software like the BNLP library and iNLTK library, but does not provide specific version numbers for these or any other ancillary software dependencies required for reproducibility. |
| Experiment Setup | Yes | We use 60 as maximum sequence length with a batch size of 16, embedding size of 300, 64 LSTM hidden units, and Adam optimizer (Kingma and Ba 2015) with a learning rate of 0.001. We run the training for 50 epochs and check the improvement of validation (dev set) loss to save the latest best model during training. |