Deep Structured Prediction with Nonlinear Output Transformations

Authors: Colin Graber, Ofer Meshi, Alexander Schwing

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our proposed approach on real-world applications, including OCR, image tagging, multilabel classification and semantic segmentation. In each case, the proposed approach is able to improve task performance over deep structured baselines. and 4 Experiments We evaluate our non-linear structured deep net model on several diverse tasks: word recognition, image tagging, multilabel classification, and semantic segmentation.
Researcher Affiliation Collaboration Colin Graber Ofer Meshi Alexander Schwing cgraber2@illinois.edu meshi@google.com aschwing@illinois.edu University of Illinois at Urbana-Champaign Google
Pseudocode Yes Algorithm 1 Inference Procedure and Algorithm 2 Weight Update Procedure
Open Source Code Yes Code available at: https://github.com/cgraber/NLStruct.
Open Datasets Yes The dataset was constructed by taking a list of 50 common five-letter English words... by selecting a random image of each letter from the Chars74K dataset [11]... For this set of experiments, we compare against SPENs on the Bibtex and Bookmarks datasets used by Belanger and Mc Callum [3] and Tu and Gimpel [55]... Next, we train image tagging models using the MIRFLICKR25k dataset [20]... Finally, we run foreground-background segmentation on the Weizmann Horses database [5]...
Dataset Splits Yes The training, validation, and test sets for these experiments consist of 1,000, 200, and 200 words, respectively, generated in this way. The train/development/test sets for these experiments consisted of 10,000/5,000/10,000 images, respectively. We use train/validation/test splits of 196/66/66 images, respectively.
Hardware Specification No We thank NVIDIA for providing the GPUs used for this research. The paper mentions GPUs but does not specify any particular models or other hardware details like CPUs or memory.
Software Dependencies No We implemented this non-linear structured deep net model using the PyTorch framework. The paper mentions PyTorch but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes Both graph configurations of the Linear Top and NLTop models finished 400 epochs of training in approximately 2 hours... 500/1000 pairs were chosen for the structured models for Bibtex and Bookmarks... We scale the input images such that the smaller dimension is 224 pixels long and take a center crop of 224x224 pixels; the same is done for the masks, except using a length of 64 pixels... the NLStruct model required approximately 10 hours to complete 160 training epochs.