Trainable Decoding of Sets of Sequences for Neural Sequence Models

Authors: Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we show results on the image captioning task and find that our model outperforms standard techniques and natural ablations.
Researcher Affiliation Collaboration 1School of Interactive Computing, Georgia Tech, Atlanta, GA, USA 2Facebook AI Research, Menlo Park, CA, USA.
Pseudocode Yes Algorithm 1 Sequential Subset Selection
Open Source Code Yes 1pronounced diff-BS, code available at https://github. com/ashwinkalyan/diff-bs
Open Datasets Yes Datasets and Models. We show results on three captioning datasets of increasing size Flickr8k, Flickr30k (Young et al., 2014) and the large scale COCO dataset (Lin et al., 2014).
Dataset Splits Yes For the first two Flickr datasets, 1000 images each are used for validation and testing while using the rest (6000 and 28000 respectively) for training. For COCO, a similar split is used but the number of images used for validation and testing each is 5000.
Hardware Specification No The paper mentions training models but does not specify any hardware details such as GPU/CPU models, memory, or cloud computing instances used for experiments.
Software Dependencies No The paper mentions training with Adam and using an LSTM, but it does not provide specific version numbers for any software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Both the DSF and the LSTM (in the case of EE) are trained using Adam (Kingma & Ba, 2014) with a learning rate of 1e 4 and 1e 5 respectively. We set the beam size K = 5 in all our experiments. As mentioned in Section 2, we first do a coarse selection using a standard sequence model; inputting only the top-100 alternatives corresponding to each partial solution to the DSF.