reproducibilityindex.ai

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

Authors: Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results with our solution on image captioning, code generation, text summarization, and machine translation tasks suggest that with similar hyperparameters, our algorithm is capable of recovering autoregressive orders that are even better than ﬁxed orders.
Researcher Affiliation	Academia	Xuanlin Li , Brandon Trabucco , Dong Huk Park, Michael Luo University of California, Berkeley {xuanlinli17, btrabucco, dong.huk.park, michael.luo}@berkeley.edu Sheng Shen, Trevor Darrell, Yang Gao University of California, Berkeley; Tsinghua University {sheng.s, trevordarrell}@berkeley.edu, gy20073@gmail.com
Pseudocode	Yes	Algorithm 1 Variational Order Inference
Open Source Code	Yes	Our experimental framework is available at this link.
Open Datasets	Yes	For NL2Code, we use Django (Oda et al., 2015). For image captioning, we use COCO 2017 (Lin et al., 2015). For text summarization, we use English Gigaword (Graff et al., 2003; Rush et al., 2015). For machine translation, we use WMT16 Romanian-English (Ro-En).
Dataset Splits	Yes	We compare metrics as a function of the sequence length of generated captions on the COCO 2017 validation set.
Hardware Specification	Yes	We compare the runtime performance of VOI (K = 4) with SAO on a single Tesla P100 GPU
Software Dependencies	No	The paper mentions using Adam Optimizer and Torch Vision, but does not specify the version numbers for the main deep learning framework (e.g., PyTorch, TensorFlow) or other critical software dependencies required for reproduction.
Experiment Setup	Yes	For our decoder, we set dmodel = 512, dhidden = 2048, 6 layers for both Transformer s encoder and decoder, and 8 attention heads. This is the same model conﬁguration as Transformer Base (Vaswani et al., 2017) and as described in Gu et al. (2019a). Our encoder also uses the same conﬁguration. For our model trained with Variational Order Inference , we sample K = 4 latents for each training sample.