Automated Cross-prompt Scoring of Essay Traits

Authors: Robert Ridley, Liang He, Xin-yu Dai, Shujian Huang, Jiajun Chen13745-13753

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the widely used ASAP and ASAP++ datasets and demonstrate that our approach is able to outperform leading prompt-specific trait scoring and cross-prompt AES methods.
Researcher Affiliation Academia Robert Ridley, Liang He, Xin-yu Dai,* Shujian Huang, Jiajun Chen National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China {robertr, heliang}@smail.nju.edu.cn, {daixinyu, huangsj, chenjj}@nju.edu.cn
Pseudocode No The paper describes the model architecture and equations but does not include a pseudocode block or a clearly labeled algorithm.
Open Source Code Yes Our code is available at https://github.com/robert1ridley/crossprompt-trait-scoring.
Open Datasets Yes In this work, our experimentation is carried out on the Automated Student Assessment Prize (ASAP)3 dataset. ASAP is a large-scale dataset that was introduced as part of a Kaggle competition in 2012 and it has since become widely used in prompt-specific AES... The dataset is available at https://www.kaggle.com/c/asap-aes. Since only Prompts 7 and 8 possess trait scores, we additionally utilize the ASAP++ dataset (Mathias and Bhattacharyya 2018), which builds on top of the original ASAP dataset.
Dataset Splits Yes Following research in cross-prompt holistic AES (Jin et al. 2018; Ridley et al. 2020), we perform prompt-wise cross validation, whereby essays for one prompt are used as test data and essays from the remaining prompts are used as training data. This is repeated for each prompt. In each case, the development set comprises essays from the same prompts as the training set.
Hardware Specification Yes We run each model five times on an NVIDIA5 Ge Force GTX 1080 graphics card
Software Dependencies No The paper mentions using 'python NLTK' for POS-tagging and models being 'implemented with Tensorflow4 in Python', but specific version numbers for NLTK, TensorFlow, or Python are not provided.
Experiment Setup Yes Optimization for all models is carried out with the RMSprop algorithm (Dauphin, de Vries, and Bengio 2015) with the learning rate set to 0.001. We train all models for a total of 50 epochs.