reproducibilityindex.ai

Structure Regularization for Structured Prediction

Authors: Xu Sun

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show both theoretically and empirically that structure regularization can effectively control overﬁtting risk and lead to better accuracy. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving record-breaking accuracies yet with substantially faster training speed.
Researcher Affiliation	Academia	Xu Sun MOE Key Laboratory of Computational Linguistics, Peking University School of Electronics Engineering and Computer Science, Peking University xusun@pku.edu.cn
Pseudocode	Yes	Algorithm 1 Training with structure regularization
Open Source Code	Yes	See the code at http://klcl.pku.edu.cn/member/sunxu/code.htm
Open Datasets	Yes	Part-of-Speech Tagging (POS-Tagging). We use the standard benchmark dataset in prior work [3], with 38,219 training samples and 5,462 test samples. ... Biomedical Named Entity Recognition (Bio-NER). This task is from the Bio NLP-2004 shared task [19]. There are 17,484 training samples and 3,856 test samples. ... Word Segmentation (Word-Seg). We use the MSR data provided by SIGHAN-2004 contest [4]. There are 86,918 training samples and 3,985 test samples. ... Sensor-based Human Activity Recognition (Act-Recog). ... with the data extracted from the Bao04 activity recognition dataset [15]. ...There are 16,000 training samples and 4,000 test samples.
Dataset Splits	Yes	For Weight Reg, the L2 regularization strengths (i.e., λ/2 in Eq.(8)) are tuned among values 0.1, 0.5, 1, 2, 5, and are determined on the development data (POS-Tagging) or simply via 4-fold cross validation on the training set (Bio-NER, Word-Seg, and Act-Recog).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU or GPU models, memory, or specific cloud instances).
Software Dependencies	No	The paper mentions software components like CRFs, structured perceptrons, and SGD, but it does not list specific version numbers for any programming languages, libraries, or frameworks used in the implementation or experimentation (e.g., Python, PyTorch, scikit-learn versions).
Experiment Setup	Yes	For Weight Reg, the L2 regularization strengths (i.e., λ/2 in Eq.(8)) are tuned among values 0.1, 0.5, 1, 2, 5, and are determined on the development data (POS-Tagging) or simply via 4-fold cross validation on the training set (Bio-NER, Word-Seg, and Act-Recog). With this automatic tuning for Weight Reg, we set 2, 5, 1 and 5 for POS-Tagging, Bio-NER, Word-Seg, and Act-Recog tasks, respectively. ... in experiments we use the SGD with decaying learning rate.