A Latent-Variable Model for Intrinsic Probing
Authors: Karolina Stańczak, Lucas Torroba Hennigen, Adina Williams, Ryan Cotterell, Isabelle Augenstein
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that our model is versatile and yields tighter mutual information estimates than two intrinsic probes previously proposed in the literature. Finally, we find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax. |
| Researcher Affiliation | Collaboration | Karolina Sta nczak*1, Lucas Torroba Hennigen*2, Adina Williams3, Ryan Cotterell4, Isabelle Augenstein1 1University of Copenhagen 2Massachusetts Institute of Technology 3Meta AI 4ETH Zürich |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code necessary to replicate our experiments is available at http://https://github.com/copenlu/flexible-probing. |
| Open Datasets | Yes | This consists of first automatically mapping treebanks from UD v2.1 (Nivre et al. 2017) to the Uni Morph (Mc Carthy et al. 2018) schema. |
| Dataset Splits | Yes | We estimate our probes parameters using the UD training set and conduct greedy selection to approximate the objective in Eq. (1) on the validation set; finally, we report the results on the test set... and Early stopping is conducted by holding out 10% of the training data; our development set is reserved for the greedy selection of subsets of neurons. |
| Hardware Specification | Yes | For comparison, in English number, on an Nvidia RTX 2070 GPU, our POISSON, GAUSSIAN and LINEAR experiments take a few minutes or even seconds to run, compared to UPPER BOUND which takes multiple hours. |
| Software Dependencies | No | The paper states 'Our implementation is built with Py Torch (Paszke et al. 2019)' but does not provide a specific version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We train our probes for a maximum of 2000 epochs using the Adam optimizer (Kingma and Ba 2015). We add early stopping with a patience of 50 as a regularization technique. ... we set λ1, λ2 = 10 5 for all probes. ... a slight improvement in the performance of POISSON and CONDITIONAL POISSON was obtained by scaling the entropy term in Eq. (3) by a factor of 0.01. |