RLogist: Fast Observation Strategy on Whole-Slide Images with Deep Reinforcement Learning
Authors: Boxuan Zhao, Jun Zhang, Deheng Ye, Jian Cao, Xiao Han, Qiang Fu, Wei Yang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that the performance of our RL agent is comparable to state-of-the-art methods while having a significantly shortened observation path.We benchmark our method on two whole-slide level classification tasks, including detection of metastases in WSIs of lymph node sections, and subtyping of lung cancer. |
| Researcher Affiliation | Collaboration | 1 Tencent AI Lab 2 Shanghai Jiao Tong University |
| Pseudocode | Yes | Algorithm 1: The Agent-Environment Loop at Step tAlgorithm 2: DRL Training Pipeline |
| Open Source Code | Yes | Our code is available at: https://github.com/tencent-ailab/RLogist. |
| Open Datasets | Yes | We use two publicly available WSI datasets, i.e., the CAMELYON16 lymph node WSI dataset 1, and the TCGA-NSCLC lung cancer dataset 2" with corresponding footnotes: "1https://camelyon16.grand-challenge.org/ 2https://www.cancer.gov/tcga |
| Dataset Splits | No | CAMELYON16 consists of 270 annotated whole slides for training and another 129 slides as a held-out official test set. We directly use the officially divided training set and test set in our experiments." and "We randomly split the dataset into 835 training slides and 209 testing slides with stratified sampling from 2 different TCGA projects." The paper does not explicitly mention a separate validation split or its proportion. |
| Hardware Specification | Yes | All models in the experiment are implemented in Py Torch and trained until convergence on half NVIDIA Tesla T4 GPU. |
| Software Dependencies | No | The paper mentions software like Py Torch, PPO2, stable-baselines3, openai/baselines, and Clean RL, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use the Adam optimizer with an annealing learning rate, which is initialized as 2.5 10 4 and decays to 0 as the number of time steps increases, and ϵ = 1 10 5 to update the model weights during the training of our policy network. Crossentropy loss is adopted to calculate the trajectory reward. |