A Theory of Unsupervised Translation Motivated by Understanding Animal Communication

Authors: Shafi Goldwasser, David Gruber, Adam Tauman Kalai, Orr Paradise

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We exemplify this theory with two stylized models of language, for which our framework provides bounds on necessary sample complexity; the bounds are formally proven and experimentally verified on synthetic data.
Researcher Affiliation Collaboration Shafi Goldwasser UC Berkeley & Project CETI shafi.goldwasser@berkeley.edu; David F. Gruber Project CETI david@projectceti.org; Adam Tauman Kalai Microsoft Research & Project CETI adam@kal.ai; Orr Paradise UC Berkeley & Project CETI orrp@eecs.berkeley.edu
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides formal mathematical definitions and proofs.
Open Source Code Yes code can be found at https://github.com/orrp/theory-of-umt.
Open Datasets No The paper states, "We validate our theorems generating synthetic data from randomly-generated languages according to each model". While the generation process is described and the code is available, the data itself is synthetic and not provided as a pre-existing public dataset with specific access information (URL, DOI, citation).
Dataset Splits Yes Number of validation data 1000 (Figure 8)
Hardware Specification Yes The experiments were run in parallel on an AWS r6i.4xlarge
Software Dependencies No The paper mentions using the GPT-3 API for certain examples, but it does not specify version numbers for any software, libraries, or dependencies used to run the experiments.
Experiment Setup Yes Figure 7: Parameters for experiments in the knowledge graph model (Figure 4). ... Figure 8: Parameters for experiments in the common nonsense model (Figure 5). (These figures list specific values for various parameters defining the experimental setup, such as number of nodes, edge density, agreement parameter, and number of samples.)