reproducibilityindex.ai

Automated Cross-prompt Scoring of Essay Traits

Authors: Robert Ridley, Liang He, Xin-yu Dai, Shujian Huang, Jiajun Chen13745-13753

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on the widely used ASAP and ASAP++ datasets and demonstrate that our approach is able to outperform leading prompt-speciﬁc trait scoring and cross-prompt AES methods.
Researcher Affiliation	Academia	Robert Ridley, Liang He, Xin-yu Dai,* Shujian Huang, Jiajun Chen National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, China {robertr, heliang}@smail.nju.edu.cn, {daixinyu, huangsj, chenjj}@nju.edu.cn
Pseudocode	No	The paper describes the model architecture and equations but does not include a pseudocode block or a clearly labeled algorithm.
Open Source Code	Yes	Our code is available at https://github.com/robert1ridley/crossprompt-trait-scoring.
Open Datasets	Yes	In this work, our experimentation is carried out on the Automated Student Assessment Prize (ASAP)3 dataset. ASAP is a large-scale dataset that was introduced as part of a Kaggle competition in 2012 and it has since become widely used in prompt-speciﬁc AES... The dataset is available at https://www.kaggle.com/c/asap-aes. Since only Prompts 7 and 8 possess trait scores, we additionally utilize the ASAP++ dataset (Mathias and Bhattacharyya 2018), which builds on top of the original ASAP dataset.
Dataset Splits	Yes	Following research in cross-prompt holistic AES (Jin et al. 2018; Ridley et al. 2020), we perform prompt-wise cross validation, whereby essays for one prompt are used as test data and essays from the remaining prompts are used as training data. This is repeated for each prompt. In each case, the development set comprises essays from the same prompts as the training set.
Hardware Specification	Yes	We run each model ﬁve times on an NVIDIA5 Ge Force GTX 1080 graphics card
Software Dependencies	No	The paper mentions using 'python NLTK' for POS-tagging and models being 'implemented with Tensorﬂow4 in Python', but specific version numbers for NLTK, TensorFlow, or Python are not provided.
Experiment Setup	Yes	Optimization for all models is carried out with the RMSprop algorithm (Dauphin, de Vries, and Bengio 2015) with the learning rate set to 0.001. We train all models for a total of 50 epochs.