reproducibilityindex.ai

Authorship Attribution Using a Neural Network Language Model

Authors: Zhenhao Ge, Yufang Sun, Mark Smith

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we investigate the performance of a feedforward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2.5% reduction in perplexity and increases author classiﬁcation accuracy by 3.43% on average, given as few as 5 test sentences.
Researcher Affiliation	Academia	Zhenhao Ge, Yufang Sun and Mark J.T. Smith School of Electrical and Computer Engineering, Purdue University 465 Northwestern Ave, West Lafayette, Indiana, USA, 47907-2035 Emails: {zge, sun361, mjts}@purdue.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. Figure 2 is a diagram of the NNLM setup, not pseudocode.
Open Source Code	Yes	The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution.
Open Datasets	Yes	The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution. The database is composed of transcripts of 16 courses from Coursera, collected one sentence per line into a text ﬁle for each course.
Dataset Splits	Yes	In implementation, the processed text data for each course are randomly split into training, validation, and test sets with ratio 8:1:1.
Hardware Specification	No	The paper mentions 'less computationally expensive' but does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions general tools like 'NNLM toolkits' and 'SRILM' but does not specify software dependencies with version numbers for its own implementation.
Experiment Setup	Yes	We optimized a 4-gram NNLM with mini-batch training through 10 to 20 epochs for each course. The model parameters, such as number of nodes in each layer, learning rate, and momentum are customized for obtaining the best individual models.