Cross-modal Bidirectional Translation via Reinforcement Learning

Authors: Jinwei Qi, Yuxin Peng

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted to verify the performance of our proposed approach on crossmodal retrieval, compared with 11 state-of-the-art methods on 3 datasets.
Researcher Affiliation Academia Institute of Computer Science and Technology, Peking University, Beijing 100871, China pengyuxin@pku.edu.cn
Pseudocode Yes Algorithm 1 Reinforcement training process of CBT
Open Source Code No Insufficient information. The paper does not provide any explicit statements about releasing source code or links to a repository.
Open Datasets Yes Wikipedia dataset [Rasiwasia et al., 2010]... Pascal Sentence dataset [Rashtchian et al., 2010]... XMedia Net dataset [Peng et al., 2017a]
Dataset Splits Yes Wikipedia dataset... 2,173 pairs for training, 231 for validation and 462 for testing. and Pascal Sentence dataset... 800 image/text pairs are selected for training, while 100 pairs for testing and 100 pairs for validation. and XMedia Net dataset... 32,000 pairs for training, 4,000 for testing and 4,000 for validation.
Hardware Specification No Insufficient information. The paper does not provide specific details about the hardware used for running experiments (e.g., CPU/GPU models, memory).
Software Dependencies No Insufficient information. While TensorFlow is mentioned, no specific version number is provided for it or any other software dependencies.
Experiment Setup Yes The Word CNN contains 3 convolution layers, followed by Re LU activation and max-pooling. Their parameters are (384,15) (512,9) (256,7)... The LSTM for image and text have two units in series, whose output has the same dimension with input as 300... Finally, two-pathway network consists of 5 fully-connected layers (4,396 3,000 2,000 1,000 600) from image to text on each pathway.