Iterative Patch Selection for High-Resolution Image Recognition

Authors: Benjamin Bergner, Christoph Lippert, Aravindh Mahendran

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance and efficiency of our method on three challenging datasets from a variety of domains and training regimes: Multi-class recognition of distant traffic signs in megapixel images, weakly-supervised classification in gigapixel whole-slide images (WSI) using self-supervised representations, and multi-task learning of inter-patch relations on a synthetic megapixel MNIST benchmark.
Researcher Affiliation Collaboration Benjamin Bergner1, Christoph Lippert1,2, Aravindh Mahendran3 1Hasso Plattner Institute for Digital Engineering, University of Potsdam 2Hasso Plattner Institute for Digital Health at the Icahn School of Medicine at Mount Sinai 3Google Research, Brain Team
Pseudocode Yes A PSEUDOCODE Algorithm 1: Pseudocode for IPS and Patch Aggregation
Open Source Code Yes We discuss these in detail next and provide code at https://github.com/benbergner/ips.
Open Datasets Yes We first evaluate our method on the Swedish traffic signs dataset, which consists of 747 training and 684 test images with 1.3 megapixel resolution, as in Katharopoulos & Fleuret (2019). ... Next, we consider the CAMELYON16 dataset (Litjens et al., 2018), which consists of 270 training and 129 test WSIs of gigapixel resolution... megapixel MNIST introduced in Katharopoulos & Fleuret (2019) requires the recognition of multiple patches and their relations. The dataset consists of 5,000 training and 1,000 test images of size 1,500 1,500.
Dataset Splits Yes We first evaluate our method on the Swedish traffic signs dataset, which consists of 747 training and 684 test images with 1.3 megapixel resolution... Next, we consider the CAMELYON16 dataset (Litjens et al., 2018), which consists of 270 training and 129 test WSIs... megapixel MNIST introduced in Katharopoulos & Fleuret (2019)... consists of 5,000 training and 1,000 test images...
Hardware Specification Yes Both metrics are calculated on a single NVIDIA A100 GPU in all experiments.
Software Dependencies No The paper mentions using `torch.cuda.Event` for timing, implying the use of PyTorch, but it does not specify version numbers for PyTorch, CUDA, or other software dependencies.
Experiment Setup Yes All models are trained for 150 epochs (megapixel MNIST, traffic signs) or 50 epochs (CAMELYON16) on the respective training sets... The batch size is 16, and AdamW with weight decay of 0.1 is used as optimizer (Loshchilov & Hutter, 2017). After a linear warm-up period of 10 epochs, the learning rate is set to 0.0003 when finetuning pre-trained networks and 0.001 when training from scratch. The learning rate is then decayed by a factor of 1,000 over the course of training using a cosine schedule.