A Latent-Variable Model for Intrinsic Probing

Authors: Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that our model is versatile and yields tighter mutual information estimates than two intrinsic probes previously proposed in the literature. Finally, we find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
Researcher Affiliation Collaboration Karolina Sta nczak*1, Lucas Torroba Hennigen*2, Adina Williams3, Ryan Cotterell4, Isabelle Augenstein1 1University of Copenhagen 2Massachusetts Institute of Technology 3Meta AI 4ETH Zürich
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code necessary to replicate our experiments is available at http://https://github.com/copenlu/flexible-probing.
Open Datasets Yes This consists of first automatically mapping treebanks from UD v2.1 (Nivre et al. 2017) to the Uni Morph (Mc Carthy et al. 2018) schema.
Dataset Splits Yes We estimate our probes parameters using the UD training set and conduct greedy selection to approximate the objective in Eq. (1) on the validation set; finally, we report the results on the test set... and Early stopping is conducted by holding out 10% of the training data; our development set is reserved for the greedy selection of subsets of neurons.
Hardware Specification Yes For comparison, in English number, on an Nvidia RTX 2070 GPU, our POISSON, GAUSSIAN and LINEAR experiments take a few minutes or even seconds to run, compared to UPPER BOUND which takes multiple hours.
Software Dependencies No The paper states 'Our implementation is built with Py Torch (Paszke et al. 2019)' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup Yes We train our probes for a maximum of 2000 epochs using the Adam optimizer (Kingma and Ba 2015). We add early stopping with a patience of 50 as a regularization technique. ... we set λ1, λ2 = 10 5 for all probes. ... a slight improvement in the performance of POISSON and CONDITIONAL POISSON was obtained by scaling the entropy term in Eq. (3) by a factor of 0.01.