Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation

Authors: Kun Yuan, vinkle srivastav, Nassir Navab, Nicolas Padoy

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple public surgical scene understanding and cross-modal retrieval datasets show that our proposed method significantly improves zero-shot transferring performance and offers a generalist visual representation for further advancements in surgical scene understanding.
Researcher Affiliation Academia 1University of Strasbourg, CNRS, INSERM, ICube, UMR7357, Strasbourg, France 2IHU Strasbourg, Strasbourg, France 3CAMP, Technische Universität München, Munich, Germany
Pseudocode Yes Algorithm 1 DTW to align sequences using cost matrix
Open Source Code No The source code will be available at https://github.com/CAMMA-public/Peska VLP.
Open Datasets Yes Our pretraining is conducted on the videos of SVL [76] dataset. The pertaining dataset includes hierarchical textual annotations from the metadata of the videos [75].
Dataset Splits Yes We fit the model on the training and validation sets and report the performance on the separate test set.
Hardware Specification Yes We train the model with 4 NVIDIA A100 GPUs each having a DRAM of 80 GB for 200 epochs.
Software Dependencies No The paper mentions software components like Res Net50, Clinical Bert, AdamW, and torchvision, but does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes We train the model with a batch size of 120/80/25 for clip-/phase-/video-level, respectively. We sample 4/16/64 frames for videos of clip/phase-/video-level. We use Adam W optimizer [30] with a learning rate of 5e 5. We train the model with 4 NVIDIA A100 GPUs each having a DRAM of 80 GB for 200 epochs. Temperature parameter β for distance function and ϕ for DTW-base contrastive loss function D are fixed as 0.1. Scale factor λ is set as 0.01.