Gaussian Process Probes (GPP) for Uncertainty-Aware Probing

Authors: Zi Wang, Alexander Ku, Jason Baldridge, Tom Griffiths, Been Kim

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate GPP on datasets containing both synthetic and real images. Our experiments show it can (1) probe a model s representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
Researcher Affiliation Collaboration Zi Wang Google Deep Mind Alexander Ku Google Deep Mind Jason Baldridge Google Deep Mind Thomas L. Griffiths Princeton University Been Kim Google Deep Mind
Pseudocode No The paper describes the GPP method using mathematical equations and prose, but it does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our code can be found at https://github.com/google-research/gpax.
Open Datasets Yes We use 3D Shapes [Burgess and Kim, 2018] and construct 3 datasets based on its concept ontology... In-distribution (ID) queries are sampled disjointly from the validation split of the Image Net dataset [Russakovsky et al., 2015]... These superclasses are defined by building a tree using the Word Net hierarchy [Miller, 1994].
Dataset Splits Yes Queries and observations in the real world datasets are sampled disjointly from the validation split of the Image Net dataset [Russakovsky et al., 2015]. Queries are sampled disjointly from the training data and observations.
Hardware Specification No The paper does not specify any hardware details such as GPU or CPU models, memory, or cloud computing instance types used for the experiments.
Software Dependencies No The paper mentions using software concepts like 'CNN models' and 'linear probes' but does not specify any particular software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes For GPP, we use Beta(0.1, 0.1) as the prior and set strength to be 5. We train three CNN models on labels generated from this ontology: 1) M.1 is trained on 64 labels..., 2) M.2 is trained on 8 labels..., 3) M.3 is trained on 8 labels.