ProTo: Program-Guided Transformer for Program-Guided Tasks
Authors: Zelin Zhao, Karan Samel, Binghong Chen, lee song
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Pro To significantly outperforms the previous state-of-the-art methods on GQA visual reasoning and 2D Minecraft policy learning datasets. Additionally, Pro To demonstrates better generalization to unseen, complex, and human-written programs. We evaluate Pro To on two tasks, program-guided visual reasoning and program-guided policy learning (corresponding to Figure 1 left and Figure 1 right). |
| Researcher Affiliation | Collaboration | Zelin Zhao The Chinese University of Hong Kong zelin@link.cuhk.edu.hk Karan Samel Georgia Institute of Technology ksamel@gatech.edu Binghong Chen Georgia Institute of Technology binghong@gatech.edu Le Song Biomap and MBZUAI dasongle@gmail.com |
| Pseudocode | Yes | Algorithm 1: Pro To Execution |
| Open Source Code | No | We will release the code and pre-trained models after publishing. |
| Open Datasets | Yes | We conduct experiments of program-guided visual reasoning based on the public GQA dataset [47] consisting of 22 million questions over 140 thousand images. It is divided into training, validation, and testing splits. |
| Dataset Splits | Yes | The GQA dataset [47] consisting of 22 million questions over 140 thousand images. It is divided into training, validation, and testing splits. On the training split, we train a transformer-based seq2seq model [87] to parse a question into a program. For validation and testing, we use this trained seq2seq model to acquire a program from a question. |
| Hardware Specification | No | The paper does not explicitly state specific hardware components (like GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The optimizer is BERT Adam optimizer [24] with a base learning rate 1 10 4, which is decayed by a factor of 0.5 every epoch. To alleviate over-fitting, we adopt an L2 weight decay of 0.01. |
| Experiment Setup | Yes | We take N = 50 object features (provided by the GQA dataset) with d = 2048 dimension. The optimizer is BERT Adam optimizer [24] with a base learning rate 1 10 4, which is decayed by a factor of 0.5 every epoch. To alleviate over-fitting, we adopt an L2 weight decay of 0.01. The model is trained for 20 epochs on the training split, and the best model evaluated on the validation split is submitted to the public evaluation server to get testing results. |