Meaning Representations from Trajectories in Autoregressive Models
Authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Lastly, our method shows an improvement in performance that correlates with model size, suggesting that further performance gains could be obtained as larger/better autoregressive models are used. |
| Researcher Affiliation | Collaboration | Tian Yu Liu UCLA1 Matthew Trager AWS AI Labs2 Alessandro Achille AWS AI Labs2 Pramuditha Perera AWS AI Labs2 Luca Zancato AWS AI Labs2 Stefano Soatto AWS AI Labs2 1tianyu@cs.ucla.edu 2{mttrager,aachille,pramudi,zancato,soattos}@amazon.com |
| Pseudocode | Yes | Algorithm 1 Similarity in Meaning Space Require: Model M, Strings u and v, num. trajectories n, max trajectory length m, distance d Tu Sample n trajectories from u up to [EOS] or length m, whichever occurs sooner Tv Sample n trajectories from v up to [EOS] or length m, whichever occurs sooner Initialize Mu = Mv = for t = a1 . . . amt Tu Tv do Compute trajectory likelihood Mu[t] Qmt i=1 PM(ai|u a1 . . . ai 1)1/mt Mv[t] Qmt i=1 PM(ai|v a1 . . . ai 1)1/mt end for return d(Mu, Mv) Return similarity score |
| Open Source Code | Yes | Our code is available at: https://github.com/tianyu139/ meaning-as-trajectories |
| Open Datasets | Yes | Semantic Textual Similarity (STS) (Agirre et al., 2012; 2013; 2014; 2015; 2016; Cer et al., 2017): The STS dataset scores how similar two pieces of texts are. Stanford Natural Language Inference (SNLI) (Bowman et al., 2015): SNLI labels pairs of strings based on the categories {entailment, neutral, contradiction}. Word Net (Miller, 1995): Word Net establishes a hierarchy among English words through semantics-based hypernym/hyponym relations. Crisscrossed Captions (Cx C) (Parekh et al., 2020): Cx C extends MS-COCO (Lin et al., 2014) with human-labelled semantic similarity scores ranging from 0-5 for image-image, caption-caption, and image-text pairs. |
| Dataset Splits | Yes | Distance metric and hyperparameter choices for semantic similarity are based on a search using the validation set of the STS-B dataset, and are then fixed when evaluating on all test datasets. |
| Hardware Specification | No | No specific hardware (GPU models, CPU models, specific cloud instances) used for experiments is mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned. |
| Experiment Setup | Yes | We use eq. (2) as our distance function. We report results using other metrics/divergences in the Appendix. We use multinomial sampling for all experiments on our method with sampling temperature λ = 1.0. We set n = 20 and m = 20 for sampling trajectories, based on ablations in Appendix A. Distance metric and hyperparameter choices for semantic similarity are based on a search using the validation set of the STS-B dataset, and are then fixed when evaluating on all test datasets. |