Randomized Greedy Search for Structured Prediction: Amortized Inference and Learning
Authors: Chao Ma, F A Rezaur Rahman Chowdhury, Aryan Deshwal, Md Rakibul Islam, Janardhan Rao Doppa, Dan Roth
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Perform comprehensive experiments on ten diverse SP tasks including sequence labeling, multi-label classification, co-reference resolution, and image segmentation. Results show that our approach is competitive or better than many state-of-the-art approaches in spite of its simplicity. |
| Researcher Affiliation | Academia | Chao Ma1 , F A Rezaur Rahman Chowdhury2 , Aryan Deshwal2 , Md Rakibul Islam2 , Janardhan Rao Doppa2 and Dan Roth3 1School of EECS, Oregon State University 2School of EECS, Washington State University 3Department of Computer and Information Science, University of Pennsylvania |
| Pseudocode | Yes | Algorithm 1 RGS(α) Inference Solver, Algorithm 2 Amortized RGS Inference, Algorithm 3 Structured Learning with Amortized RGS |
| Open Source Code | Yes | The code and data is publicly available on github: https://github.com/nkg114mc/rgs-struct |
| Open Datasets | Yes | We employ five sequence labeling datasets. 1) Handwriting Recognition: We consider two variants [Daum e et al., 2009]: one fold for training and remaining nine folds for testing in HW-Small, and vice-versa in HW-Large. 2) NETtalk Stress: The task is to assign one of the 5 stress labels to each letter of a word. 3) NETtalk Phoneme: Similar to stress task except the goal is to assign one of the 51 phoneme labels. The training/testing split of NETtalk is 1000/1000. 4) Protein: The aim is to predict secondary structure of amino-acid residues. There training/testing split is 111/17. 5) Twitter POS tagging: 25 POS labels dataset consisting of 1000-tweet OCT27TRAIN, 327tweet OCT27DEV, 547-tweet DAILY547 as test set [Tu and Gimpel, 2018]. We employ three multi-label datasets, where the goal is to predict a binary vector corresponding to the relevant labels. 6) Yeast: There are 14 labels and training/testing split of 1500/917. 7) Bibtex: There are 159 labels and training/testing split of 4800/2515. 8) Bookmarks: There are 208 labels and training/testing split of 60000/27856. We employ one coreference resolution dataset, where the goal is to cluster a set of textual mentions. 9) ACE2005: This is a corpus of English documents with 50 to 300 gold mentions in each document. We follow the standard training/testing split of 338/117 [Durrett and Klein, 2014]. We employ one image segmentation dataset, where the goal is to label each pixel in an image with its semantic label. 10) MSRC: This dataset contains 591 images and 21 labels. We employ standard training/testing split of 276/256, and each image was pre-segmented into around 700 patches with SLIC algorithm. The code and data is publicly available on github: https://github.com/nkg114mc/rgs-struct |
| Dataset Splits | Yes | We employ a validation set to tune the hyper-parameters: C for Structured SVM and α [0, 1] for RGS inference. For MSRC and ACE2005, we use the standard development set and employ 20 percent of training data as validation set for other datasets. |
| Hardware Specification | Yes | All experiments were run on a machine with dual processor 6 Core 2.67Ghz Intel Xeon CPU and 48GB memory. |
| Software Dependencies | No | The paper mentions using 'Illinois-SL library' and 'off-the-shelf logistic regression implementation', and refers to a 'seq2seq implementation derived from tf-seq2seq' (implying TensorFlow), but no specific version numbers are provided for any of these software components. |
| Experiment Setup | Yes | We employ a validation set to tune the hyper-parameters: C for Structured SVM and α [0, 1] for RGS inference. For this experiment, we employ 50 restarts, highest-order features, and optimize Hamming loss except for Yeast (F1 loss). The baseline RGS is run with 50 restarts. We employ a simple online learner based on gradient descent to learn E: learning rate η = 0.1 and five online learning iterations. |