SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Authors: Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of extensive experiments on both Endo Vis2018 and Endo Vis2017 datasets demonstrate that Surgical SAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. We conduct extensive experiments on the challenging Endo Vis2018 and Endo Vis2017 datasets, achieving state-of-the-art (SOTA) performance while significantly improving training efficiency. |
| Researcher Affiliation | Academia | 1School of Computer Science, The University of Sydney 2School of Computer Science, Northwestern Polytechnical University 3Department of Computer Science, University of Rochester {wenxi.yue, jing.zhang1, kun.hu, zhiyong.wang}@sydney.edu.au, yxia@nwpu.edu.cn, jluo@cs.rochester.edu |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The source code is available at https://github.com/wenxi-yue/Surgical SAM. |
| Open Datasets | Yes | We use the Endo Vis2018 (Allan et al. 2020) and Endo Vis2017 (Allan et al. 2019) datasets and adhere to the standard protocols defined by Shvets et al. (2018) and Gonz alez, Bravo-S anchez, and Arbelaez (2020). |
| Dataset Splits | Yes | Endo Vis2017 consists of eight videos, each with 255 frames, for which we perform 4-fold cross-validation following Shvets et al. (2018). Endo Vis2018 offers 11 training videos and four validation videos with each consisting of 149 frames. |
| Hardware Specification | Yes | Our model is implemented using Py Torch and trained and evaluated on an Nvidia Tesla V100 16GB GPU. |
| Software Dependencies | No | Our model is implemented using Py Torch and trained and evaluated on an Nvidia Tesla V100 16GB GPU. (PyTorch version is not specified). |
| Experiment Setup | Yes | For the prototype-based prompt encoder, the intermediate dimensions r D and r S are both set to 128 and the number of tokens per class n is set to 2 and 4 for Endo Vis2018 and Endo Vis2017, respectively. For prototype contrastive loss, a temperature τ of 0.07 is used. We employ an Adam optimiser with a learning rate of 0.001 and 0.0001 for Endo Vis2018 and Endo Vis2017, respectively. To reduce computational load, we adopt pre-computed image embeddings in training, employing a batch size of 32. |