Learning Distributed Representations for Structured Output Prediction

Authors: Vivek Srikumar, Christopher D. Manning

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on two tasks which have semantically rich labels: multiclass classification on the newsgroup data and part-of-speech tagging for English and Basque. In all cases, we show that DISTRO outperforms the structural SVM baselines. We demonstrate the effectiveness of DISTRO on two tasks document classification (purely atomic structures) and part-of-speech (POS) tagging (both atomic and compositional structures). In both cases, we compare to structural SVMs i.e. the case of one-hot label vectors as the baseline. Table 1 reports the performance of the baseline and variants of DISTRO for newsgroup classification. Table 2 presents the results for the two languages.
Researcher Affiliation Academia Vivek Srikumar University of Utah svivek@cs.utah.edu Christopher D. Manning Stanford University manning@cs.stanford.edu
Pseudocode Yes Algorithm 1 Learning algorithm by alternating minimization. The goal is to solve minw,A f(w, A). The input to the problem is a training set of examples consisting of pairs of labeled inputs (xi, yi) and T, the number of iterations.
Open Source Code No The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described.
Open Datasets Yes We used the bydate version of the data with tokens as features. Table 1 reports the performance of the baseline and variants of DISTRO for newsgroup classification. The 20 Newsgroups Dataset [13]. English POS tagging has been long studied using the Penn Treebank data [15]. We used the Basque data from the Co NLL 2007 shared task [17] for training the Basque POS tagger.
Dataset Splits Yes We selected the hyper-parameters for all experiments by cross validation. We used the standard train-test split [8, 24] we trained on sections 0-18 of the Treebank and report performance on sections 22-24.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using the 'Stanford NLP pipeline' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes We selected the hyper-parameters for all experiments by cross validation. We ran the alternating algorithm for 5 epochs for all cases with 5 epochs of SGD for both the weight and label vectors. We allowed the baseline to run for 25 epochs over the data.