Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

WhAM: Towards A Translative Model of Sperm Whale Vocalization

Authors: Orr Paradise, Liangyuan Chen, Pranav Muralikrishnan, Hugo Flores, Bryan Pardo, Roee Diamant, David Gruber, Shane Gero, Shafi Goldwasser

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Wh AM s synthetic codas using Fréchet Audio Distance and through perceptual studies with expert marine biologists. On downstream classification tasks including rhythm, social unit, and vowel classification, Wh AM s learned representations achieve strong performance, despite being trained for generation rather than classification.
Researcher Affiliation	Academia	1UC Berkeley 2Project CETI 3Northwestern University 4Haifa University 5City University of New York 6Carleton University
Pseudocode	No	The paper describes the methodological framework in Section 3 and details model training in Appendix E, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The NeurIPS Paper Checklist states: 'Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We will release the model weights, and the code used for training and evaluating the model upon publication.'
Open Datasets	Yes	Freesound Dataset [Font et al., 2013]... Audio Set [Gemmeke et al., 2017a]... The Watkins Marine Mammal Sound Database [Sayigh et al., 2016]... Bird Set [Rauch et al., 2025]
Dataset Splits	Yes	We split the dataset into 80% training and 20% testing, using stratified sampling of labels to ensure consistent label distribution.
Hardware Specification	Yes	The model took 123 hours to train using an AWS EC2 g5.2xlarge instance (NVIDIA A10 GPU, 8 v CPUs, 32 GB of memory).
Software Dependencies	No	The paper mentions optimizers and activation functions but does not specify version numbers for any key software components or libraries used for implementation (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	The model was trained for 500,000 iterations using the Adam W optimizer with a learning rate of 0.0001. A batch size of 6 was used, and gradient clipping was applied to stabilize training... The fine-tuning process used the same optimizer and learning rate as the pretraining phase and a batch size of 6... Training is performed on an NVIDIA A10G GPU for 10 epochs, using a learning rate of 10 4 and a batch size of 32.