Deep Structured Prediction with Nonlinear Output Transformations
Authors: Colin Graber, Ofer Meshi, Alexander Schwing
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our proposed approach on real-world applications, including OCR, image tagging, multilabel classification and semantic segmentation. In each case, the proposed approach is able to improve task performance over deep structured baselines. and 4 Experiments We evaluate our non-linear structured deep net model on several diverse tasks: word recognition, image tagging, multilabel classification, and semantic segmentation. |
| Researcher Affiliation | Collaboration | Colin Graber Ofer Meshi Alexander Schwing cgraber2@illinois.edu meshi@google.com aschwing@illinois.edu University of Illinois at Urbana-Champaign Google |
| Pseudocode | Yes | Algorithm 1 Inference Procedure and Algorithm 2 Weight Update Procedure |
| Open Source Code | Yes | Code available at: https://github.com/cgraber/NLStruct. |
| Open Datasets | Yes | The dataset was constructed by taking a list of 50 common five-letter English words... by selecting a random image of each letter from the Chars74K dataset [11]... For this set of experiments, we compare against SPENs on the Bibtex and Bookmarks datasets used by Belanger and Mc Callum [3] and Tu and Gimpel [55]... Next, we train image tagging models using the MIRFLICKR25k dataset [20]... Finally, we run foreground-background segmentation on the Weizmann Horses database [5]... |
| Dataset Splits | Yes | The training, validation, and test sets for these experiments consist of 1,000, 200, and 200 words, respectively, generated in this way. The train/development/test sets for these experiments consisted of 10,000/5,000/10,000 images, respectively. We use train/validation/test splits of 196/66/66 images, respectively. |
| Hardware Specification | No | We thank NVIDIA for providing the GPUs used for this research. The paper mentions GPUs but does not specify any particular models or other hardware details like CPUs or memory. |
| Software Dependencies | No | We implemented this non-linear structured deep net model using the PyTorch framework. The paper mentions PyTorch but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | Both graph configurations of the Linear Top and NLTop models finished 400 epochs of training in approximately 2 hours... 500/1000 pairs were chosen for the structured models for Bibtex and Bookmarks... We scale the input images such that the smaller dimension is 224 pixels long and take a center crop of 224x224 pixels; the same is done for the masks, except using a length of 64 pixels... the NLStruct model required approximately 10 hours to complete 160 training epochs. |