Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation

Authors: Divyanshu Mishra, Mohammadreza Salehi, Pramit Saha, Olga Patey, Aris Papageorghiou, Yuki Asano, Alison Noble

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on six echocardiography datasets spanning fetal, pediatric, and adult populations, DISCOVR outperforms both specialized video anomaly detection methods and state-of-the-art video-SSL baselines in zero-shot and linear probing setups,achieving superior segmentation transfer and strong downstream performance on clinically relevant tasks such as LVEF prediction.
Researcher Affiliation Academia 1Department of Engineering Science, University of Oxford 2Nuffield Department of Women s and Reproductive Health, University of Oxford 3Fundamental AI Lab, University of Technology Nuremberg 4 University of Amsterdam
Pseudocode No The paper describes the methodology in prose, detailing the video self-distillation, masked image self-distillation, and semantic cluster distillation processes within the 'Methodology' section, but does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes Code available at: https://github.com/mdivyanshu97/DISCOVR
Open Datasets Yes For adult and pediatric echocardiography, we use 3 public datasets: Echo Net Dynamic (apical 4CH adult; 7378/1326/1326) [28], Echo Pediatric LVH (parasternal long-axis pediatric; 7837/1592/1592) [8], and RVENet (right ventricular pediatric/adult; 2516/487/573) [20]. For the downstream segmentation task, we utilize the CAMUS [19] dataset. ... (b) Echo Net-Dynamic and Echo Net-Pediatric: Both are publicly available datasets... For more information, see: https://echonet.github.io/dynamic/, https://echonet.github.io/pediatric/. (c) RVENet: This dataset is available for non-commercial research use... For more information, see: https://rvenet.github.io/dataset/. (d) CAMUS: This dataset is publicly available for research use...
Dataset Splits Yes Two private fetal heart datasets...Fetal Echo1 includes 8273/414/317 and Fetal Echo2 includes 4154/320/305 videos for training/validation/testing. For adult and pediatric echocardiography, we use 3 public datasets: Echo Net Dynamic (apical 4CH adult; 7378/1326/1326) [28], Echo Pediatric LVH (parasternal long-axis pediatric; 7837/1592/1592) [8], and RVENet (right ventricular pediatric/adult; 2516/487/573) [20].
Hardware Specification Yes All models are implemented in Py Torch 2.6 and trained on RTX 8000 GPUs (48 GB) with a batch size of 8 using the Adam W optimizer.
Software Dependencies Yes All models are implemented in Py Torch 2.6 and trained on RTX 8000 GPUs (48 GB) with a batch size of 8 using the Adam W optimizer.
Experiment Setup Yes All models are implemented in Py Torch 2.6 and trained on RTX 8000 GPUs (48 GB) with a batch size of 8 using the Adam W optimizer. Videos are processed as 64-frame clips sampled at a stride of 3 and resized to 112 112. For both video and image self-distillation, we use a student-teacher setup where the teacher processes the full input and the student observes N = 4 randomly masked views. The teacher network is updated via an exponential moving average (EMA) of the student with momentum λ = 0.996. A fixed temperature τs = 0.1 is used for the student, while the teacher temperature τt is linearly warmed from 0.04 to 0.07 over the first 30 epochs. Semantic Cluster Distillation (SCD) uses K = 3000 learnable prototypes, with similarity scores computed via temperature-scaled dot products (τ = 0.1) and cluster assignments generated using the Sinkhorn-Knopp algorithm (10 iterations, ϵ = 0.05). Models are trained for 400 epochs with a learning rate of 1.5 10 4, weight decay of 0.05, and 40 warmup epochs.