Authorship Attribution Using a Neural Network Language Model
Authors: Zhenhao Ge, Yufang Sun, Mark Smith
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we investigate the performance of a feedforward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2.5% reduction in perplexity and increases author classification accuracy by 3.43% on average, given as few as 5 test sentences. |
| Researcher Affiliation | Academia | Zhenhao Ge, Yufang Sun and Mark J.T. Smith School of Electrical and Computer Engineering, Purdue University 465 Northwestern Ave, West Lafayette, Indiana, USA, 47907-2035 Emails: {zge, sun361, mjts}@purdue.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Figure 2 is a diagram of the NNLM setup, not pseudocode. |
| Open Source Code | Yes | The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution. |
| Open Datasets | Yes | The source code, preprocessed datasets, a detailed description of the methodology and results are available at https://github.com/zge/authorship-attribution. The database is composed of transcripts of 16 courses from Coursera, collected one sentence per line into a text file for each course. |
| Dataset Splits | Yes | In implementation, the processed text data for each course are randomly split into training, validation, and test sets with ratio 8:1:1. |
| Hardware Specification | No | The paper mentions 'less computationally expensive' but does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions general tools like 'NNLM toolkits' and 'SRILM' but does not specify software dependencies with version numbers for its own implementation. |
| Experiment Setup | Yes | We optimized a 4-gram NNLM with mini-batch training through 10 to 20 epochs for each course. The model parameters, such as number of nodes in each layer, learning rate, and momentum are customized for obtaining the best individual models. |