Dependent Multi-Task Learning with Causal Intervention for Image Captioning
Authors: Wenqing Chen, Jidong Tian, Caoyun Fan, Hao He, Yaohui Jin
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The extensive experiments show that our model outperforms the baseline models and achieves competitive performance with state-of-the-art models. |
| Researcher Affiliation | Academia | Wenqing Chen1,2 , Jidong Tian1,2 , Caoyun Fan1,2 , Hao He1,2 and Yaohui Jin1,2 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University 2State Key Lab of Advanced Optical Communication System and Network, Shanghai Jiao Tong University |
| Pseudocode | No | The paper describes its model and approach in detail but does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not explicitly state that its source code for the described methodology is available. It mentions 'publicly released code' for metrics and 'officially released codes' for reproducing other models, but not its own. |
| Open Datasets | Yes | We experiment on the MSCOCO dataset, which is the most popular dataset for image captioning. The original dataset contains about 82,783 training images and 40,504 validation images. |
| Dataset Splits | Yes | Following most of the previous work, we first evaluate our model on the Karpathy data split [Karpathy and Li, 2015] with 5,000 images for validation, 5,000 images for testing, and the rest for training. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used to run its experiments. |
| Software Dependencies | No | The paper mentions using 'publicly released code' for metrics but does not provide specific version numbers for any software dependencies or libraries used in its own implementation. |
| Experiment Setup | Yes | The learning rate is initialized to 0.0001 and decreased by half when the CIDEr-D score does not increase in 2 epochs, with the minimum learning rate set to 5e-6. The batch size is set to 50. The model is firstly optimized with MLE for 30 epochs (Gumbel sampling fm after 15 epochs), and then optimized with MARL for another 35 epochs. |