Ladder Loss for Coherent Visual-Semantic Embedding

Authors: Mo Zhou, Zhenxing Niu, Le Wang, Zhanning Gao, Qilin Zhang, Gang Hua13050-13057

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple datasets validate the efficacy of our proposed method, which achieves significant improvement over existing state-of-the-art methods.
Researcher Affiliation Collaboration 1Xidian University, 2Alibaba Group, 3Xi an Jiaotong University, 4HERE Technologies, 5Wormpex AI Research
Pseudocode No The paper describes its approach and loss functions using mathematical formulations and textual explanations, but does not include a dedicated pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes Following related works, Flickr30K (Plummer et al. 2015) and MS-COCO (Lin et al. 2014; Chen et al. 2015) datasets are used in our experiments.
Dataset Splits Yes For Flickr30K, we use 1, 000 images for validation, 1, 000 for testing and the rest for training, which is consistent with (Faghri et al. 2018). For MS-COCO, we also follow (Faghri et al. 2018) and use 5, 000 images for both validation and testing. Meanwhile, the rest 30, 504 images in original validation set are used for training (113, 287 training images in total) in our experiments following (Faghri et al. 2018).
Hardware Specification Yes The BERT inference is highly computational expensive (e.g., a single NVIDIA Titan Xp GPU could compute similarity score for only approximately 65 sentence pairs per second).
Software Dependencies No The paper mentions software like PyTorch, BERT, CBoW, GloVe, and Adam solver, but does not provide specific version numbers for any of these components.
Experiment Setup Yes The dimension of the GRU and the joint embedding space is set at D = 1024. The dimension of the word embeddings used as input to the GRU is set to 300. Additionally, Adam solver is used for optimization, with the learning rate set at 2e-4 for 15 epochs, and then decayed to 2e-5 for another 15 epochs. We use a mini-batch of size 128 in all experiments in this paper. ...the threshold θ1 for splitting N q 1 and N q 2 is fixed at 0.63, and the margins α1 = 0.2, α2 = 0.01, the loss weights β1 = 1, β2 = 0.25.