RLogist: Fast Observation Strategy on Whole-Slide Images with Deep Reinforcement Learning

Authors: Boxuan Zhao, Jun Zhang, Deheng Ye, Jian Cao, Xiao Han, Qiang Fu, Wei Yang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that the performance of our RL agent is comparable to state-of-the-art methods while having a significantly shortened observation path.We benchmark our method on two whole-slide level classification tasks, including detection of metastases in WSIs of lymph node sections, and subtyping of lung cancer.
Researcher Affiliation Collaboration 1 Tencent AI Lab 2 Shanghai Jiao Tong University
Pseudocode Yes Algorithm 1: The Agent-Environment Loop at Step tAlgorithm 2: DRL Training Pipeline
Open Source Code Yes Our code is available at: https://github.com/tencent-ailab/RLogist.
Open Datasets Yes We use two publicly available WSI datasets, i.e., the CAMELYON16 lymph node WSI dataset 1, and the TCGA-NSCLC lung cancer dataset 2" with corresponding footnotes: "1https://camelyon16.grand-challenge.org/ 2https://www.cancer.gov/tcga
Dataset Splits No CAMELYON16 consists of 270 annotated whole slides for training and another 129 slides as a held-out official test set. We directly use the officially divided training set and test set in our experiments." and "We randomly split the dataset into 835 training slides and 209 testing slides with stratified sampling from 2 different TCGA projects." The paper does not explicitly mention a separate validation split or its proportion.
Hardware Specification Yes All models in the experiment are implemented in Py Torch and trained until convergence on half NVIDIA Tesla T4 GPU.
Software Dependencies No The paper mentions software like Py Torch, PPO2, stable-baselines3, openai/baselines, and Clean RL, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use the Adam optimizer with an annealing learning rate, which is initialized as 2.5 10 4 and decays to 0 as the number of time steps increases, and ϵ = 1 10 5 to update the model weights during the training of our policy network. Crossentropy loss is adopted to calculate the trajectory reward.