Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flexible inference for animal learning rules using neural networks

Authors: Yuhan Helena Liu, Victor Geadah, Jonathan Pillow

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice.
Researcher Affiliation Academia Yuhan Helena Liu1,*, Victor Geadah1, and Jonathan Pillow1,* 1Princeton University, Princeton, NJ, USA *Correspondence: EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Learning Rule Inference from Trial-by-Trial Behavior
Open Source Code Yes Our code is available https://github.com/Helena-Yuhan-Liu/Infer Learning ANNGLM.
Open Datasets Yes For real data, we used the publicly available International Brain Laboratory (IBL) dataset [31], following the same preprocessing as in [19].
Dataset Splits Yes We evaluate model performance using 5-fold cross-validation, where each fold holds out a subset of animals entirely. ... Second, to assess temporal generalization, we adopt a future-data holdout strategy [103].
Hardware Specification Yes Simulations were run on a machine equipped with a 12th Gen Intel(R) Core(TM) i7-12700H processor (14 cores, 20 threads) with a maximum clock speed of 2.30GHz.
Software Dependencies Yes We implemented our models in Py Torch v1.13.1 [105]. ... For statistical analysis and significance testing, we used tools from the Sci Py package [106].
Experiment Setup Yes We tune hyperparameters, including the number of hidden units, number of layers, learning rate, and number of training epochs. We used a batch size of 1 for simulated data. For real IBL data, we trained on all animals jointly; using a batch size of 1 instead did not affect our conclusions. ... We used the Adam optimizer with a default learning rate of 1e 3, and binary cross-entropy loss, which corresponds to the negative log-likelihood under a Bernoulli model.