Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
Authors: Zi Wang, Alexander Ku, Jason Baldridge, Tom Griffiths, Been Kim
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate GPP on datasets containing both synthetic and real images. Our experiments show it can (1) probe a model s representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do. |
| Researcher Affiliation | Collaboration | Zi Wang Google Deep Mind Alexander Ku Google Deep Mind Jason Baldridge Google Deep Mind Thomas L. Griffiths Princeton University Been Kim Google Deep Mind |
| Pseudocode | No | The paper describes the GPP method using mathematical equations and prose, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code can be found at https://github.com/google-research/gpax. |
| Open Datasets | Yes | We use 3D Shapes [Burgess and Kim, 2018] and construct 3 datasets based on its concept ontology... In-distribution (ID) queries are sampled disjointly from the validation split of the Image Net dataset [Russakovsky et al., 2015]... These superclasses are defined by building a tree using the Word Net hierarchy [Miller, 1994]. |
| Dataset Splits | Yes | Queries and observations in the real world datasets are sampled disjointly from the validation split of the Image Net dataset [Russakovsky et al., 2015]. Queries are sampled disjointly from the training data and observations. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU or CPU models, memory, or cloud computing instance types used for the experiments. |
| Software Dependencies | No | The paper mentions using software concepts like 'CNN models' and 'linear probes' but does not specify any particular software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | For GPP, we use Beta(0.1, 0.1) as the prior and set strength to be 5. We train three CNN models on labels generated from this ontology: 1) M.1 is trained on 64 labels..., 2) M.2 is trained on 8 labels..., 3) M.3 is trained on 8 labels. |