Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Authors: Kun Yuan, vinkle srivastav, Nassir Navab, Nicolas Padoy
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple public surgical scene understanding and cross-modal retrieval datasets show that our proposed method significantly improves zero-shot transferring performance and offers a generalist visual representation for further advancements in surgical scene understanding. |
| Researcher Affiliation | Academia | 1University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France 2IHU Strasbourg, Strasbourg, France 3CAMP, Technische Universität München, Munich, Germany |
| Pseudocode | Yes | Algorithm 1 DTW to align sequences using cost matrix |
| Open Source Code | No | The source code will be available at https://github.com/CAMMA-public/Peska VLP. |
| Open Datasets | Yes | Our pretraining is conducted on the videos of SVL [76] dataset. The pertaining dataset includes hierarchical textual annotations from the metadata of the videos [75]. |
| Dataset Splits | Yes | We fit the model on the training and validation sets and report the performance on the separate test set. |
| Hardware Specification | Yes | We train the model with 4 NVIDIA A100 GPUs each having a DRAM of 80 GB for 200 epochs. |
| Software Dependencies | No | The paper mentions software components like Res Net50, Clinical Bert, AdamW, and torchvision, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | We train the model with a batch size of 120/80/25 for clip-/phase-/video-level, respectively. We sample 4/16/64 frames for videos of clip/phase-/video-level. We use Adam W optimizer [30] with a learning rate of 5e 5. We train the model with 4 NVIDIA A100 GPUs each having a DRAM of 80 GB for 200 epochs. Temperature parameter β for distance function and ϕ for DTW-base contrastive loss function D are fixed as 0.1. Scale factor λ is set as 0.01. |