Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Authors: Jianyang Gu, Sam Stevens, Elizabeth Campolongo, Matthew J Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander White, James Balhoff, Wasila Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun (Harry) Chao, Yu Su

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then train BIOCLIP 2 on TREEOFLIFE-200M to distinguish different species. Despite the narrow training objective, BIOCLIP 2 yields extraordinary accuracy when applied to various biological visual tasks such as habitat classification and trait prediction. We identify emergent properties in the learned embedding space of BIOCLIP 2. At the inter-species level, the embedding distribution of different species aligns closely with functional and ecological meanings (e.g., beak sizes and habitats). At the intra-species level, instead of being diminished, the intra-species variations (e.g., life stages and sexes) are preserved and better separated in subspaces orthogonal to inter-species distinctions.
Researcher Affiliation	Academia	1The Ohio State University, 2Smithsonian Institution, 3UNC Chapel Hill, 4University of California, Irvine, 5Princeton University, 6Duke University
Pseudocode	No	The paper describes the methodology and provides a formal proof in Section 5.4 and Appendix C, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Models, data, and code available at imageomics.github.io/bioclip-2. EMAIL
Open Datasets	Yes	Models, data, and code available at imageomics.github.io/bioclip-2. EMAIL
Dataset Splits	Yes	Fish Net: The original train-test split is adopted, where 75,631 images are used in training, and the remaining 18,901 images are used for testing. This task evaluates whether the embedding distribution of different species is aligned with their ecological relationships. Plant Doc: One image per class is randomly selected from the original training split as the support set, and Simple Shot [86] is employed to predict the class labels for the test set. Accuracy over all the testing samples is reported as the performance.
Hardware Specification	Yes	We train BIOCLIP 2 on 32 NVIDIA H100 GPUs for 10 days on 214M organismal biology images with hierarchical labels and 26M randomly-sampled image-text pairs from LAION-2B for 30 epochs. We provide the training details in D. All the experiments are conducted with 1 NVIDIA A100 GPU, and the running time for each task is within 30 minutes.
Software Dependencies	No	The paper mentions various models and tools used (e.g., CLIP, Sig LIP, DINOv3, Pytorchwildlife, MTCNN, TaxonoPy, distributed-downloader) but does not specify their version numbers or the versions of underlying software libraries like Python or PyTorch.
Experiment Setup	Yes	Table 4: The adopted hyper-parameter setting in training BIOCLIP 2. Hyper-parameter Value Architecture Vi T-L/14 Optimizer Adam Batch size/GPU (organism) 2,816 Batch size/GPU (replay) 320 GPUs 32 H100s Epochs 30 Max learning rate 1 x 10^-4 Warm-up steps 1,875 Weight decay 0.2 Input resolution 224