Randomized Greedy Search for Structured Prediction: Amortized Inference and Learning

Authors: Chao Ma, F A Rezaur Rahman Chowdhury, Aryan Deshwal, Md Rakibul Islam, Janardhan Rao Doppa, Dan Roth

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Perform comprehensive experiments on ten diverse SP tasks including sequence labeling, multi-label classification, co-reference resolution, and image segmentation. Results show that our approach is competitive or better than many state-of-the-art approaches in spite of its simplicity.
Researcher Affiliation Academia Chao Ma1 , F A Rezaur Rahman Chowdhury2 , Aryan Deshwal2 , Md Rakibul Islam2 , Janardhan Rao Doppa2 and Dan Roth3 1School of EECS, Oregon State University 2School of EECS, Washington State University 3Department of Computer and Information Science, University of Pennsylvania
Pseudocode Yes Algorithm 1 RGS(α) Inference Solver, Algorithm 2 Amortized RGS Inference, Algorithm 3 Structured Learning with Amortized RGS
Open Source Code Yes The code and data is publicly available on github: https://github.com/nkg114mc/rgs-struct
Open Datasets Yes We employ five sequence labeling datasets. 1) Handwriting Recognition: We consider two variants [Daum e et al., 2009]: one fold for training and remaining nine folds for testing in HW-Small, and vice-versa in HW-Large. 2) NETtalk Stress: The task is to assign one of the 5 stress labels to each letter of a word. 3) NETtalk Phoneme: Similar to stress task except the goal is to assign one of the 51 phoneme labels. The training/testing split of NETtalk is 1000/1000. 4) Protein: The aim is to predict secondary structure of amino-acid residues. There training/testing split is 111/17. 5) Twitter POS tagging: 25 POS labels dataset consisting of 1000-tweet OCT27TRAIN, 327tweet OCT27DEV, 547-tweet DAILY547 as test set [Tu and Gimpel, 2018]. We employ three multi-label datasets, where the goal is to predict a binary vector corresponding to the relevant labels. 6) Yeast: There are 14 labels and training/testing split of 1500/917. 7) Bibtex: There are 159 labels and training/testing split of 4800/2515. 8) Bookmarks: There are 208 labels and training/testing split of 60000/27856. We employ one coreference resolution dataset, where the goal is to cluster a set of textual mentions. 9) ACE2005: This is a corpus of English documents with 50 to 300 gold mentions in each document. We follow the standard training/testing split of 338/117 [Durrett and Klein, 2014]. We employ one image segmentation dataset, where the goal is to label each pixel in an image with its semantic label. 10) MSRC: This dataset contains 591 images and 21 labels. We employ standard training/testing split of 276/256, and each image was pre-segmented into around 700 patches with SLIC algorithm. The code and data is publicly available on github: https://github.com/nkg114mc/rgs-struct
Dataset Splits Yes We employ a validation set to tune the hyper-parameters: C for Structured SVM and α [0, 1] for RGS inference. For MSRC and ACE2005, we use the standard development set and employ 20 percent of training data as validation set for other datasets.
Hardware Specification Yes All experiments were run on a machine with dual processor 6 Core 2.67Ghz Intel Xeon CPU and 48GB memory.
Software Dependencies No The paper mentions using 'Illinois-SL library' and 'off-the-shelf logistic regression implementation', and refers to a 'seq2seq implementation derived from tf-seq2seq' (implying TensorFlow), but no specific version numbers are provided for any of these software components.
Experiment Setup Yes We employ a validation set to tune the hyper-parameters: C for Structured SVM and α [0, 1] for RGS inference. For this experiment, we employ 50 restarts, highest-order features, and optimize Hamming loss except for Yeast (F1 loss). The baseline RGS is run with 50 restarts. We employ a simple online learner based on gradient descent to learn E: learning rate η = 0.1 and five online learning iterations.