Tuning Computer Vision Models With Task Rewards

Authors: André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate some of our key results in Figure 1, highlighting both quantitative and qualitative improvements brought by reward optimization for object detection, panoptic segmentation, and image colorization. We believe this approach has the potential to be widely useful for better aligning models with a diverse range of computer vision tasks.
Researcher Affiliation Collaboration 1Google DeepMind, Zurich, Switzerland 2Work done during internship while being a PhD student at the University of Oxford. Correspondence to: Andr´e Susano Pinto <andresp@google.com>, Alexander Kolesnikov <akolesnikov@google.com>.
Pseudocode Yes Algorithm 1 MLE optimization step and Algorithm 2 Reward optimization step.
Open Source Code Yes We use the big vision codebase (Beyer et al., 2022) for all experiments in this project. (Reference: Beyer, L., Zhai, X., and Kolesnikov, A. Big Vision. https://github.com/google-research/ big_vision, 2022.)
Open Datasets Yes We pretrain the model on the Objects365 dataset (Shao et al., 2019) and further finetune on the COCO (Lin et al., 2014) dataset. The model was trained using MLE on COCO panoptic dataset.
Dataset Splits Yes Table 1. Panoptic segmentation results on COCO panoptic validation set after reward optimization. (and) Analysis of reward distribution before and after tuning the model for the image captioning task. Measured as mean of 1024 validation examples.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions software components like 'Adafactor' and 'BERT' and the 'big vision codebase', but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes Panoptic segmentation: We use REINFORCE rule to tune the MLE model for this reward, with a batch size of 128 for 30k steps with constant learning rate 10^-6 after a warmup of 4k steps. (and) Appendix A provides detailed tables, e.g., 'Table 6. Object detection settings. MLE Objects365 pretraining RESOLUTION: 640x640 BATCH SIZE: 256 LEARNING-RATE: 1e-3 WEIGHT-DECAY: 5e-5 SCHEDULE: COSINE TOTAL STEPS: 400 000 WARMUP STEPS: 20 000'.