ProTo: Program-Guided Transformer for Program-Guided Tasks

Authors: Zelin Zhao, Karan Samel, Binghong Chen, lee song

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that Pro To significantly outperforms the previous state-of-the-art methods on GQA visual reasoning and 2D Minecraft policy learning datasets. Additionally, Pro To demonstrates better generalization to unseen, complex, and human-written programs. We evaluate Pro To on two tasks, program-guided visual reasoning and program-guided policy learning (corresponding to Figure 1 left and Figure 1 right).
Researcher Affiliation Collaboration Zelin Zhao The Chinese University of Hong Kong zelin@link.cuhk.edu.hk Karan Samel Georgia Institute of Technology ksamel@gatech.edu Binghong Chen Georgia Institute of Technology binghong@gatech.edu Le Song Biomap and MBZUAI dasongle@gmail.com
Pseudocode Yes Algorithm 1: Pro To Execution
Open Source Code No We will release the code and pre-trained models after publishing.
Open Datasets Yes We conduct experiments of program-guided visual reasoning based on the public GQA dataset [47] consisting of 22 million questions over 140 thousand images. It is divided into training, validation, and testing splits.
Dataset Splits Yes The GQA dataset [47] consisting of 22 million questions over 140 thousand images. It is divided into training, validation, and testing splits. On the training split, we train a transformer-based seq2seq model [87] to parse a question into a program. For validation and testing, we use this trained seq2seq model to acquire a program from a question.
Hardware Specification No The paper does not explicitly state specific hardware components (like GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The optimizer is BERT Adam optimizer [24] with a base learning rate 1 10 4, which is decayed by a factor of 0.5 every epoch. To alleviate over-fitting, we adopt an L2 weight decay of 0.01.
Experiment Setup Yes We take N = 50 object features (provided by the GQA dataset) with d = 2048 dimension. The optimizer is BERT Adam optimizer [24] with a base learning rate 1 10 4, which is decayed by a factor of 0.5 every epoch. To alleviate over-fitting, we adopt an L2 weight decay of 0.01. The model is trained for 20 epochs on the training split, and the best model evaluated on the validation split is submitted to the public evaluation server to get testing results.