Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flexible inference for animal learning rules using neural networks

Authors: Yuhan Helena Liu, Victor Geadah, Jonathan Pillow

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice.
Researcher Affiliation	Academia	Yuhan Helena Liu1,, Victor Geadah1, and Jonathan Pillow1, 1Princeton University, Princeton, NJ, USA *Correspondence: EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Learning Rule Inference from Trial-by-Trial Behavior
Open Source Code	Yes	Our code is available https://github.com/Helena-Yuhan-Liu/Infer Learning ANNGLM.
Open Datasets	Yes	For real data, we used the publicly available International Brain Laboratory (IBL) dataset [31], following the same preprocessing as in [19].
Dataset Splits	Yes	We evaluate model performance using 5-fold cross-validation, where each fold holds out a subset of animals entirely. ... Second, to assess temporal generalization, we adopt a future-data holdout strategy [103].
Hardware Specification	Yes	Simulations were run on a machine equipped with a 12th Gen Intel(R) Core(TM) i7-12700H processor (14 cores, 20 threads) with a maximum clock speed of 2.30GHz.
Software Dependencies	Yes	We implemented our models in Py Torch v1.13.1 [105]. ... For statistical analysis and significance testing, we used tools from the Sci Py package [106].
Experiment Setup	Yes	We tune hyperparameters, including the number of hidden units, number of layers, learning rate, and number of training epochs. We used a batch size of 1 for simulated data. For real IBL data, we trained on all animals jointly; using a batch size of 1 instead did not affect our conclusions. ... We used the Adam optimizer with a default learning rate of 1e 3, and binary cross-entropy loss, which corresponds to the negative log-likelihood under a Bernoulli model.