Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Squared families are useful conjugate priors
Authors: Russell Tsuchida, Jiawei Liu, Cheng Soon Ong, Dino Sejdinovic
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply such conjugate families to Bayesian regression in feature space using end-to-end learnable neural network features. Such a setting allows for a rich multi-modal alternative to Gaussian processes with neural network features, often called deep kernel learning. We demonstrate our method on few shot learning, outperforming existing neural methods based on Gaussian processes and normalising flows.1 |
| Researcher Affiliation | Academia | Russell Tsuchida Data Science and AI Group Monash University EMAIL Jiawei Liu School of Computing Australian National University EMAIL Cheng Soon Ong Data61, CSIRO and Australian National University EMAIL Dino Sejdinovic RAIR, AIML The University of Adelaide EMAIL |
| Pseudocode | Yes | Algorithm 1 Marginal likelihood (i.e., Proposition 2) Algorithm 2 GSFP posterior predictive update (i.e., Corollary 6) Algorithm 3 GSFP Few-shot learning pretraining (i.e. paragraph 1 of Section 4) |
| Open Source Code | Yes | 1Code available at https://github.com/Carlisle-Liu/SNEFY-Process.git. |
| Open Datasets | Yes | Sines dataset [Finn et al., 2017] is comprised of input x from the interval [ 5, 5] with corresponding label y = A sin(x + p) + ϵ, where amplitude A U[0.1, 5.0] and phase p U[0, π] are uniformly distributed on their respective intervals, and ϵ N(0, 0.1) is a Gaussian noise with 0 mean and a standard deviation of 0.1. Mixed-Noise Sines [Sendera et al., 2021] is a variant of sines experiment by utilising input-dependent noise: y = A sin(x + p) + |x p| ϵ where | | is an absolute value function. NASDAQ100 Small Dataset [Qin et al., 2017] contains stock prices of 81 major corporations and index values of NASDAQ 100 from July 26 to December 22 in 2016. EEG Steady-State Visual Evoked Potential Signals Dataset [Fernandez-Fraga et al., 2019] contains the electroencephalogram (EEG) data from 29 subjects performing different visual tests. The time-series data signals of various lengths (owing to different test durations) are recorded at a frequency of 128 Hz. This dataset is available under a Creative Commons Attribution 4.0 International (CC-BY-4.0) licence. Queen Mary University of London Multiview Face Dataset (QMUL) [Gong et al., 1996] contains normalised facial images of 48 people. Power [Hebrail and Berard, 2006] dataset contains 2,075,259 electricity consumption readings in a house in Sceaux (France), spanning 47 months between December 2006 and November 2010. The time-series measurements of electricity consumption (watt per hour) are recorded at a frequency of one reading per minute on individual metres. This dataset is licenced under a Creative Commons Attribution 4.0 International (CC-BY-4.0) licence. The Boston Housing dataset [Harrison Jr and Rubinfeld, 1978] consists of 506 instances with 14 variables describing the socioeconomic characteristics of neighbourhoods in Boston, Massachusetts. We use 13 features as input variables and the remaining variable, MEDV (median value of owneroccupied homes in $1000 s), serves as the regression target. This dataset is licensed under an Apache 2.0 open source license. The Concrete Compressive Strength dataset [Yeh, 1998] consists of 1,030 instances with 8 variables describing the compositional structure, the curation duration, and the compressive strength of concrete samples. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The Energy Efficiency dataset [Tsanas and Xifara, 2012] consists of 768 samples with 8 features describing the design details of the buildings. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The Kin8nm dataset is part of the Kin family of datasets [Corke, 2002] that consists of 8,192 instances with 8 features simulating the forward kinetics of a robotic arm. The Condition Based Maintenance of Naval Propulsion Plant dataset [Coraddu et al., 2014] consists of 11,934 samples generated from a numerical simulation of a naval vessel with a gas turbine propulsion plant. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The Combined Cycle Power Plant dataset [Tüfekci, 2014] consists of 9,568 data points collected from a fully operational power plant over a period of 2006-2011. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The Physicochemical Properties of Protein Tertiary Structure dataset [Rana, 2013] consists of 45,730 samples of decoy structures. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The Wine Quality dataset [Cortez et al., 2009] has two sub-datasets regarding red wine and white wine from the Portuguese "Vinho Verde" region. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. The Yacht Hydrodynamics dataset [Gerritsma et al., 1981] consists of 308 samples with 6 features derived from hull geometry and hydrodynamic context. This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. |
| Dataset Splits | Yes | In training, 10 input-label pairs are sampled with five each for support and query sets. Evaluation is performed on 500 inference iterations on 200 data points, with a 5/195 split ratio for support and query sets. Training sets and in-distribution evaluation sets are obtained by partitioning NASDAQ 100 index with a 70/30 split. EEG... is used for training and in-distribution evaluation following a partition ratio of 70/30. Power... 70% of the data, starting from the beginning of the time series, is used for both training and in-distribution evaluation, while the remaining 30% is used for out-of-distribution evaluation. The train/test split ratio of 80/20 is applied across the datasets as described in G.1G.9. Five models are trained for each method on each dataset using random seeds ranging from 1 to 5. |
| Hardware Specification | Yes | Both training and evaluation are conducted with a single RTX 2080TI GPU and an Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz. |
| Software Dependencies | No | GSFP is implemented using Pytorch and GPytorch frameworks. All models, including DKT [Patacchiola et al., 2020], NGGP [Sendera et al., 2021] and GFSP, are trained with Adam optimiser [Kingma, 2014] using a fixed learning rate of 0.001 and default beta coefficients of β1 = 0.9 and β2 = 0.999 across all experiments. |
| Experiment Setup | Yes | The MLP has two fully-connected layers, with the first layer mapping X Rd followed by a Re LU activation function [Hahnloser et al., 2000], and the second one mapping Rd Rd. The CNN has three convolutional layers with fixed kernel size of 3, stride of 2 and dilation of 2. Input channel number of the first convolutional layer is 3, and upsized to 36 in the two subsequent convolutional layers. Output channel number is 36 across all convolutional layers. Re LU activation function is appended to each of the first two convolutional layers. Output of the last convolutional layer is flattened into a d-dimensional vector. A scaled cosine function cos( )/ d is appended as the final activation. The squared neural network consists of a hidden layer (ψ : Ω Rn, where Ω= Rd) and readout parameters (Θ Rm n). The hidden layer is a fully connected layer ψ(ω) = σ (W ω + b) with a weight matrix W Rn d initialised from a standard normal distribution N(0, I), and a bias vector b Rn initialised to be all ones. For the activation function σ, we use the Snake activation function [Ziyin et al., 2020], Table 3: Summary of dimensions used across the experiments. All models, including DKT [Patacchiola et al., 2020], NGGP [Sendera et al., 2021] and GFSP, are trained with Adam optimiser [Kingma, 2014] using a fixed learning rate of 0.001 and default beta coefficients of β1 = 0.9 and β2 = 0.999 across all experiments. All models are trained on J = 10, 000 training datasets that are randomly sampled from the meta dataset as detailed in F.1F.6. The evaluation results of each meta model is averaged over 500 test datasets, with each entry representing the mean the standard deviation. Optimal performance of each meta model is obtained with grid search at a frequency of 500 training datasets. (a) The feature extractor is the two-layer MLP; (b) We fix d = 5, n = 2, and m = 1. Both (a) and (b) are applied across the nine regression experiments. |