Iterative Patch Selection for High-Resolution Image Recognition
Authors: Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance and efficiency of our method on three challenging datasets from a variety of domains and training regimes: Multi-class recognition of distant traffic signs in megapixel images, weakly-supervised classification in gigapixel whole-slide images (WSI) using self-supervised representations, and multi-task learning of inter-patch relations on a synthetic megapixel MNIST benchmark. |
| Researcher Affiliation | Collaboration | Benjamin Bergner1, Christoph Lippert1,2, Aravindh Mahendran3 1Hasso Plattner Institute for Digital Engineering, University of Potsdam 2Hasso Plattner Institute for Digital Health at the Icahn School of Medicine at Mount Sinai 3Google Research, Brain Team |
| Pseudocode | Yes | A PSEUDOCODE Algorithm 1: Pseudocode for IPS and Patch Aggregation |
| Open Source Code | Yes | We discuss these in detail next and provide code at https://github.com/benbergner/ips. |
| Open Datasets | Yes | We first evaluate our method on the Swedish traffic signs dataset, which consists of 747 training and 684 test images with 1.3 megapixel resolution, as in Katharopoulos & Fleuret (2019). ... Next, we consider the CAMELYON16 dataset (Litjens et al., 2018), which consists of 270 training and 129 test WSIs of gigapixel resolution... megapixel MNIST introduced in Katharopoulos & Fleuret (2019) requires the recognition of multiple patches and their relations. The dataset consists of 5,000 training and 1,000 test images of size 1,500 1,500. |
| Dataset Splits | Yes | We first evaluate our method on the Swedish traffic signs dataset, which consists of 747 training and 684 test images with 1.3 megapixel resolution... Next, we consider the CAMELYON16 dataset (Litjens et al., 2018), which consists of 270 training and 129 test WSIs... megapixel MNIST introduced in Katharopoulos & Fleuret (2019)... consists of 5,000 training and 1,000 test images... |
| Hardware Specification | Yes | Both metrics are calculated on a single NVIDIA A100 GPU in all experiments. |
| Software Dependencies | No | The paper mentions using `torch.cuda.Event` for timing, implying the use of PyTorch, but it does not specify version numbers for PyTorch, CUDA, or other software dependencies. |
| Experiment Setup | Yes | All models are trained for 150 epochs (megapixel MNIST, traffic signs) or 50 epochs (CAMELYON16) on the respective training sets... The batch size is 16, and AdamW with weight decay of 0.1 is used as optimizer (Loshchilov & Hutter, 2017). After a linear warm-up period of 10 epochs, the learning rate is set to 0.0003 when finetuning pre-trained networks and 0.001 when training from scratch. The learning rate is then decayed by a factor of 1,000 over the course of training using a cosine schedule. |