Recurrent Nested Model for Sequence Generation

Authors: Wenhao Jiang, Lin Ma, Wei Lu11117-11124

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The advantages of our model are illustrated on the image captioning and code captioning tasks. Experiments We illustrate the advantages of our model on two tasks, i.e., image captioning and code captioning. The types of encoders are different for models on those two tasks.
Researcher Affiliation Collaboration 1Tencent AI Lab 2University of Electronic Science and Technology of China
Pseudocode No The paper does not contain explicitly labeled "Pseudocode" or "Algorithm" blocks. While equations are presented for LSTM, they are not structured as an algorithm.
Open Source Code No The paper states, "The bug is fixed in our code" in a footnote regarding metric computation for a previous work, but it does not provide an explicit statement or link for the open-source code of the methodology described in *this* paper.
Open Datasets Yes The MS COCO dataset1 is the largest dataset for the image captioning task. This dataset contains 82, 783, 40, 504 and 40, 775 images for training, validation and test sets respectively. The Habeas Corpus (Movshovitz-Attias and Cohen 2013) dataset is used to illustrate the effectiveness of our method.
Dataset Splits Yes The MS COCO dataset... contains 82, 783, 40, 504 and 40, 775 images for training, validation and test sets respectively. For offline evaluation, we following the conventional evaluation procedure... and employ the Karpathy s split (Karpathy and Fei-Fei 2015), which contains 5,000 images for validation, 5, 000 images for test and 113, 287 images for training.
Hardware Specification No The paper does not specify the hardware used for experiments, such as specific GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper mentions using evaluation scripts like MSCOCO caption evaluation scripts and SPICE source code, and optimizers like Adam, but it does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup Yes For scheduled sampling, the probability of sampling a token from model is min(0.25, epoch / 100), where epoch is the number of passes sweeping over training data. For LSR, the prior distribution over labels is uniform distribution and the smoothing parameter is set to 0.1. Dropout is only applied on the hidden states in reviewer and decoder and the probability is set to 0.3. The LSTM size is set as 512 for all LSTM units in our model. The Adam... is applied to optimize the network, and learning rate is set to 5e-4 and decay every 3 epochs by a factor 0.8 when training with cross entropy loss. Each mini-batch contains 10 images.