Cross-modal Bidirectional Translation via Reinforcement Learning
Authors: Jinwei Qi, Yuxin Peng
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted to verify the performance of our proposed approach on crossmodal retrieval, compared with 11 state-of-the-art methods on 3 datasets. |
| Researcher Affiliation | Academia | Institute of Computer Science and Technology, Peking University, Beijing 100871, China pengyuxin@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 Reinforcement training process of CBT |
| Open Source Code | No | Insufficient information. The paper does not provide any explicit statements about releasing source code or links to a repository. |
| Open Datasets | Yes | Wikipedia dataset [Rasiwasia et al., 2010]... Pascal Sentence dataset [Rashtchian et al., 2010]... XMedia Net dataset [Peng et al., 2017a] |
| Dataset Splits | Yes | Wikipedia dataset... 2,173 pairs for training, 231 for validation and 462 for testing. and Pascal Sentence dataset... 800 image/text pairs are selected for training, while 100 pairs for testing and 100 pairs for validation. and XMedia Net dataset... 32,000 pairs for training, 4,000 for testing and 4,000 for validation. |
| Hardware Specification | No | Insufficient information. The paper does not provide specific details about the hardware used for running experiments (e.g., CPU/GPU models, memory). |
| Software Dependencies | No | Insufficient information. While TensorFlow is mentioned, no specific version number is provided for it or any other software dependencies. |
| Experiment Setup | Yes | The Word CNN contains 3 convolution layers, followed by Re LU activation and max-pooling. Their parameters are (384,15) (512,9) (256,7)... The LSTM for image and text have two units in series, whose output has the same dimension with input as 300... Finally, two-pathway network consists of 5 fully-connected layers (4,396 3,000 2,000 1,000 600) from image to text on each pathway. |