Automated Cross-prompt Scoring of Essay Traits
Authors: Robert Ridley, Liang He, Xin-yu Dai, Shujian Huang, Jiajun Chen13745-13753
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the widely used ASAP and ASAP++ datasets and demonstrate that our approach is able to outperform leading prompt-specific trait scoring and cross-prompt AES methods. |
| Researcher Affiliation | Academia | Robert Ridley, Liang He, Xin-yu Dai,* Shujian Huang, Jiajun Chen National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China {robertr, heliang}@smail.nju.edu.cn, {daixinyu, huangsj, chenjj}@nju.edu.cn |
| Pseudocode | No | The paper describes the model architecture and equations but does not include a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | Yes | Our code is available at https://github.com/robert1ridley/crossprompt-trait-scoring. |
| Open Datasets | Yes | In this work, our experimentation is carried out on the Automated Student Assessment Prize (ASAP)3 dataset. ASAP is a large-scale dataset that was introduced as part of a Kaggle competition in 2012 and it has since become widely used in prompt-specific AES... The dataset is available at https://www.kaggle.com/c/asap-aes. Since only Prompts 7 and 8 possess trait scores, we additionally utilize the ASAP++ dataset (Mathias and Bhattacharyya 2018), which builds on top of the original ASAP dataset. |
| Dataset Splits | Yes | Following research in cross-prompt holistic AES (Jin et al. 2018; Ridley et al. 2020), we perform prompt-wise cross validation, whereby essays for one prompt are used as test data and essays from the remaining prompts are used as training data. This is repeated for each prompt. In each case, the development set comprises essays from the same prompts as the training set. |
| Hardware Specification | Yes | We run each model five times on an NVIDIA5 Ge Force GTX 1080 graphics card |
| Software Dependencies | No | The paper mentions using 'python NLTK' for POS-tagging and models being 'implemented with Tensorflow4 in Python', but specific version numbers for NLTK, TensorFlow, or Python are not provided. |
| Experiment Setup | Yes | Optimization for all models is carried out with the RMSprop algorithm (Dauphin, de Vries, and Bengio 2015) with the learning rate set to 0.001. We train all models for a total of 50 epochs. |