Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition

Authors: Mohammed Haroon Dupty, Zhen Zhang, Wee Sun Lee10737-10744

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that each part of the model improves performance and the combination outperforms stateof-the-art score on the Visual Genome (VG) and Visual Relationship Detection (VRD) datasets. We evaluate our method on the Visual Genome and the Visual Relationship Detection datasets.
Researcher Affiliation Academia Mohammed Haroon Dupty, Zhen Zhang, Wee Sun Lee School of Computing, National University of Singapore {dmharoon, leews}@comp.nus.edu.sg, zhen@zzhang.org
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1https://github.com/dmharoon/VRD-Tensor-Decompostion
Open Datasets Yes We evaluate our method on the Visual Genome and the Visual Relationship Detection datasets. VG: The Visual Genome dataset was released by (Krishna et al. 2017). VRD: The VRD dataset was released by (Lu et al. 2016) with standard train/test split 4000 and 1000 images respectively.
Dataset Splits No The VRD dataset was released by (Lu et al. 2016) with standard train/test split 4000 and 1000 images respectively. No explicit validation split information or detailed splits for Visual Genome are provided.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., CPU, GPU models, memory, or cloud resources) to run its experiments.
Software Dependencies No The paper mentions 'pytorch' but does not specify its version number or any other software dependencies with specific version details.
Experiment Setup Yes We set the learning rate to 1e-4 and use SGD as optimizer. Due to the summation in the gradient term, there is an exploding gradient problem. To fix this, we clip the gradient based on the total norm of all the learnable weights. The norm value for gradient clipping is set at 20. We then train with proposals from the detector. We sample atmost 4 proposals for every ground truth box proposal with IOU overlap of atleast 0.5. All the layers before ROI-pooling are initialized by pretrained weights from the detector.