Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
WhAM: Towards A Translative Model of Sperm Whale Vocalization
Authors: Orr Paradise, Liangyuan Chen, Pranav Muralikrishnan, Hugo Flores, Bryan Pardo, Roee Diamant, David Gruber, Shane Gero, Shafi Goldwasser
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Wh AM s synthetic codas using Frรฉchet Audio Distance and through perceptual studies with expert marine biologists. On downstream classification tasks including rhythm, social unit, and vowel classification, Wh AM s learned representations achieve strong performance, despite being trained for generation rather than classification. |
| Researcher Affiliation | Academia | 1UC Berkeley 2Project CETI 3Northwestern University 4Haifa University 5City University of New York 6Carleton University |
| Pseudocode | No | The paper describes the methodological framework in Section 3 and details model training in Appendix E, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The NeurIPS Paper Checklist states: 'Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We will release the model weights, and the code used for training and evaluating the model upon publication.' |
| Open Datasets | Yes | Freesound Dataset [Font et al., 2013]... Audio Set [Gemmeke et al., 2017a]... The Watkins Marine Mammal Sound Database [Sayigh et al., 2016]... Bird Set [Rauch et al., 2025] |
| Dataset Splits | Yes | We split the dataset into 80% training and 20% testing, using stratified sampling of labels to ensure consistent label distribution. |
| Hardware Specification | Yes | The model took 123 hours to train using an AWS EC2 g5.2xlarge instance (NVIDIA A10 GPU, 8 v CPUs, 32 GB of memory). |
| Software Dependencies | No | The paper mentions optimizers and activation functions but does not specify version numbers for any key software components or libraries used for implementation (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | The model was trained for 500,000 iterations using the Adam W optimizer with a learning rate of 0.0001. A batch size of 6 was used, and gradient clipping was applied to stabilize training... The fine-tuning process used the same optimizer and learning rate as the pretraining phase and a batch size of 6... Training is performed on an NVIDIA A10G GPU for 10 epochs, using a learning rate of 10 4 and a batch size of 32. |