GiLOT: Interpreting Generative Language Models via Optimal Transport
Authors: Xuhong Li, Jiamin Chen, Yekun Chai, Haoyi Xiong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have carried out extensive experiments on top of Llama families and their fine-tuned derivatives across various scales to validate the effectiveness of GILOT for estimating the input attributions. The results show that GILOT outperforms existing solutions on a number of faithfulness metrics under fair comparison settings. |
| Researcher Affiliation | Industry | 1Baidu Inc., Beijing, China. Correspondence to: Haoyi Xiong <haoyi.xiong.fr@ieee.org>. |
| Pseudocode | No | The paper describes algorithms and calculations within the text and equations, but it does not contain a formally structured pseudocode or algorithm block (e.g., labeled "Algorithm 1" or a code-like formatted procedure). |
| Open Source Code | Yes | Source code is publicly available at https://github.com/holyseven/Gi LOT. |
| Open Datasets | Yes | The evaluation set is composed of 100 prompts from Alpaca (Taori et al., 2023) and Share GPT2. ... In our experiment, we use the SST-2 dataset (Socher et al., 2013) and LLa MA-13b (Touvron et al., 2023a) under the in-context scenario for the binary sentiment classification task. |
| Dataset Splits | No | The paper mentions varying mask rates and evaluation metrics but does not provide specific numerical training, validation, or test dataset splits (e.g., 80/10/10 percentages or sample counts) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper mentions that the solver is "highly parallelized and implemented on GPU" in Section 3.4.3, but it does not specify any particular GPU model (e.g., NVIDIA A100, RTX 3090) or other hardware details such as CPU, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions software components like "Sinkhorn-Knopp" algorithm and "IPOT", and models like "Llama families" and "Alpaca", but it does not provide specific version numbers for these or any other key software libraries (e.g., PyTorch 1.x, Transformers 4.x). |
| Experiment Setup | Yes | To efficiently compute Eq. (3) in practice, we choose topb sequences that have the highest probabilities among all possible sequences, through the beam search decoding strategy (Wiseman & Rush, 2016). ... In most cases, we use settings of b = 10 and J = 10, meaning the top 10 nine-token sequences from beam search. ... when k is set to 100 in most of our experiments. ... Here we set the greedily generated T tokens as the output sequence {yj [1,T ]}. |