Tuning Computer Vision Models With Task Rewards
Authors: André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate some of our key results in Figure 1, highlighting both quantitative and qualitative improvements brought by reward optimization for object detection, panoptic segmentation, and image colorization. We believe this approach has the potential to be widely useful for better aligning models with a diverse range of computer vision tasks. |
| Researcher Affiliation | Collaboration | 1Google DeepMind, Zurich, Switzerland 2Work done during internship while being a PhD student at the University of Oxford. Correspondence to: Andr´e Susano Pinto <andresp@google.com>, Alexander Kolesnikov <akolesnikov@google.com>. |
| Pseudocode | Yes | Algorithm 1 MLE optimization step and Algorithm 2 Reward optimization step. |
| Open Source Code | Yes | We use the big vision codebase (Beyer et al., 2022) for all experiments in this project. (Reference: Beyer, L., Zhai, X., and Kolesnikov, A. Big Vision. https://github.com/google-research/ big_vision, 2022.) |
| Open Datasets | Yes | We pretrain the model on the Objects365 dataset (Shao et al., 2019) and further finetune on the COCO (Lin et al., 2014) dataset. The model was trained using MLE on COCO panoptic dataset. |
| Dataset Splits | Yes | Table 1. Panoptic segmentation results on COCO panoptic validation set after reward optimization. (and) Analysis of reward distribution before and after tuning the model for the image captioning task. Measured as mean of 1024 validation examples. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adafactor' and 'BERT' and the 'big vision codebase', but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | Panoptic segmentation: We use REINFORCE rule to tune the MLE model for this reward, with a batch size of 128 for 30k steps with constant learning rate 10^-6 after a warmup of 4k steps. (and) Appendix A provides detailed tables, e.g., 'Table 6. Object detection settings. MLE Objects365 pretraining RESOLUTION: 640x640 BATCH SIZE: 256 LEARNING-RATE: 1e-3 WEIGHT-DECAY: 5e-5 SCHEDULE: COSINE TOTAL STEPS: 400 000 WARMUP STEPS: 20 000'. |