Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
Authors: Bryan Wong, Jongwoo Kim, Huazhu Fu, Mun Yi
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on TCGA breast, lung, and kidney cancer datasets demonstrate that Hi VE-MIL consistently outperforms both traditional MIL and recent VLM-based MIL approaches, achieving gains of up to 4.1% in macro F1 under 16-shot settings. |
| Researcher Affiliation | Academia | 1KAIST 2IHPC, A*STAR EMAIL EMAIL |
| Pseudocode | Yes | D Text-Guided Dynamic Filtering Pseudocode Algorithm 1 Text-Guided Dynamic Filtering (TGDF) |
| Open Source Code | Yes | The code is available at https://github.com/bryanwong17/Hi VE-MIL. |
| Open Datasets | Yes | 4.1 Experimental Settings Datasets. We utilize three publicly available WSI datasets: TCGA-NSCLC (lung), TCGA-BRCA (breast), and TCGA-RCC (kidney), obtained from The Cancer Genome Atlas (TCGA).3 |
| Dataset Splits | Yes | Following [48], each dataset is split into training, validation, and test sets using a fixed 4:3:3 ratio. For the few-shot setting, we randomly sample 4, 8, and 16 WSIs per class from the training set. |
| Hardware Specification | Yes | All experiments are run using Py Torch [42] on a workstation with two NVIDIA RTX A100 GPUs. |
| Software Dependencies | No | All experiments are run using Py Torch [42] on a workstation with two NVIDIA RTX A100 GPUs. Our graph-based modules are implemented using Py Torch Geometric [13]. All experiments are conducted on Ubuntu 20.04.6 using a workstation equipped with two NVIDIA A100 GPUs (40 GB each); however, only one GPU is used for training each model. Complete package versions and dependencies are listed in the requirements.txt file available in our Git Hub repository (linked in the Abstract). |
| Experiment Setup | Yes | Hi VE-MIL operates on 5 and 20 patches, using GPT-4o [1] to generate O = 4 coarse-level texts and K = 3 fine-level substructures per class. We use L = 16 learnable context tokens (Eq. 1) and apply the TGDF threshold α = 0.5 (Eqs. 2, 3). The HHG consists of two layers and a 2-head in MSA. HTCL is used with λ = 0.5 (Eq. 9). We train using Adam optimizer [31] (learning rate: 1e 4, weight decay: 1e 5), batch size 1, for up to 50 epochs with early stopping (patience 10). |