reproducibilityindex.ai

Gaussian Process Probes (GPP) for Uncertainty-Aware Probing

Authors: Zi Wang, Alexander Ku, Jason Baldridge, Tom Griffiths, Been Kim

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate GPP on datasets containing both synthetic and real images. Our experiments show it can (1) probe a model s representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how conﬁdent the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do.
Researcher Affiliation	Collaboration	Zi Wang Google Deep Mind Alexander Ku Google Deep Mind Jason Baldridge Google Deep Mind Thomas L. Grifﬁths Princeton University Been Kim Google Deep Mind
Pseudocode	No	The paper describes the GPP method using mathematical equations and prose, but it does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Our code can be found at https://github.com/google-research/gpax.
Open Datasets	Yes	We use 3D Shapes [Burgess and Kim, 2018] and construct 3 datasets based on its concept ontology... In-distribution (ID) queries are sampled disjointly from the validation split of the Image Net dataset [Russakovsky et al., 2015]... These superclasses are deﬁned by building a tree using the Word Net hierarchy [Miller, 1994].
Dataset Splits	Yes	Queries and observations in the real world datasets are sampled disjointly from the validation split of the Image Net dataset [Russakovsky et al., 2015]. Queries are sampled disjointly from the training data and observations.
Hardware Specification	No	The paper does not specify any hardware details such as GPU or CPU models, memory, or cloud computing instance types used for the experiments.
Software Dependencies	No	The paper mentions using software concepts like 'CNN models' and 'linear probes' but does not specify any particular software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup	Yes	For GPP, we use Beta(0.1, 0.1) as the prior and set strength to be 5. We train three CNN models on labels generated from this ontology: 1) M.1 is trained on 64 labels..., 2) M.2 is trained on 8 labels..., 3) M.3 is trained on 8 labels.