Pixels to Graphs by Associative Embedding
Authors: Alejandro Newell, Jia Deng
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark on the Visual Genome dataset, and demonstrate state-of-the-art performance on the challenging task of scene graph generation. |
| Researcher Affiliation | Academia | Alejandro Newell Jia Deng Computer Science and Engineering University of Michigan, Ann Arbor {alnewell, jiadeng}@umich.edu |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | No | No explicit statement about releasing source code or a link to a code repository was found. |
| Open Datasets | Yes | We evaluate the performance of our method on the Visual Genome dataset [14]. Visual Genome consists of 108,077 images annotated with object detections and object-object relationships, and it serves as a challenging benchmark for scene graph generation on real world images. |
| Dataset Splits | No | The paper states, "We use the same categories, as well as the same training and test split as defined by the authors [26]", but does not provide specific percentages or counts for a validation split within the text. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU models, or memory specifications) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | We train a stacked hourglass architecture [21] in TensorFlow [1]. (While TensorFlow is mentioned, a specific version number is not provided, nor are other software dependencies with versions.) |
| Experiment Setup | Yes | The input to the network is a 512x512 image, with an output resolution of 64x64. ... doubling the number of features to 512 at the two lowest resolutions of the hourglass. The output feature length f is 256. All losses classification, bounding box regression, associative embedding are weighted equally throughout the course of training. We set so = 3 and sr = 6 which is sufficient to completely accommodate the detection annotations for all but a small fraction of cases. |