reproducibilityindex.ai

Meaning Representations from Trajectories in Autoregressive Models

Authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Lastly, our method shows an improvement in performance that correlates with model size, suggesting that further performance gains could be obtained as larger/better autoregressive models are used.
Researcher Affiliation	Collaboration	Tian Yu Liu UCLA1 Matthew Trager AWS AI Labs2 Alessandro Achille AWS AI Labs2 Pramuditha Perera AWS AI Labs2 Luca Zancato AWS AI Labs2 Stefano Soatto AWS AI Labs2 1tianyu@cs.ucla.edu 2{mttrager,aachille,pramudi,zancato,soattos}@amazon.com
Pseudocode	Yes	Algorithm 1 Similarity in Meaning Space Require: Model M, Strings u and v, num. trajectories n, max trajectory length m, distance d Tu Sample n trajectories from u up to [EOS] or length m, whichever occurs sooner Tv Sample n trajectories from v up to [EOS] or length m, whichever occurs sooner Initialize Mu = Mv = for t = a1 . . . amt Tu Tv do Compute trajectory likelihood Mu[t] Qmt i=1 PM(ai\|u a1 . . . ai 1)1/mt Mv[t] Qmt i=1 PM(ai\|v a1 . . . ai 1)1/mt end for return d(Mu, Mv) Return similarity score
Open Source Code	Yes	Our code is available at: https://github.com/tianyu139/ meaning-as-trajectories
Open Datasets	Yes	Semantic Textual Similarity (STS) (Agirre et al., 2012; 2013; 2014; 2015; 2016; Cer et al., 2017): The STS dataset scores how similar two pieces of texts are. Stanford Natural Language Inference (SNLI) (Bowman et al., 2015): SNLI labels pairs of strings based on the categories {entailment, neutral, contradiction}. Word Net (Miller, 1995): Word Net establishes a hierarchy among English words through semantics-based hypernym/hyponym relations. Crisscrossed Captions (Cx C) (Parekh et al., 2020): Cx C extends MS-COCO (Lin et al., 2014) with human-labelled semantic similarity scores ranging from 0-5 for image-image, caption-caption, and image-text pairs.
Dataset Splits	Yes	Distance metric and hyperparameter choices for semantic similarity are based on a search using the validation set of the STS-B dataset, and are then fixed when evaluating on all test datasets.
Hardware Specification	No	No specific hardware (GPU models, CPU models, specific cloud instances) used for experiments is mentioned.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are mentioned.
Experiment Setup	Yes	We use eq. (2) as our distance function. We report results using other metrics/divergences in the Appendix. We use multinomial sampling for all experiments on our method with sampling temperature λ = 1.0. We set n = 20 and m = 20 for sampling trajectories, based on ablations in Appendix A. Distance metric and hyperparameter choices for semantic similarity are based on a search using the validation set of the STS-B dataset, and are then fixed when evaluating on all test datasets.