GiLOT: Interpreting Generative Language Models via Optimal Transport

Authors: Xuhong Li, Jiamin Chen, Yekun Chai, Haoyi Xiong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have carried out extensive experiments on top of Llama families and their fine-tuned derivatives across various scales to validate the effectiveness of GILOT for estimating the input attributions. The results show that GILOT outperforms existing solutions on a number of faithfulness metrics under fair comparison settings.
Researcher Affiliation Industry 1Baidu Inc., Beijing, China. Correspondence to: Haoyi Xiong <haoyi.xiong.fr@ieee.org>.
Pseudocode No The paper describes algorithms and calculations within the text and equations, but it does not contain a formally structured pseudocode or algorithm block (e.g., labeled "Algorithm 1" or a code-like formatted procedure).
Open Source Code Yes Source code is publicly available at https://github.com/holyseven/Gi LOT.
Open Datasets Yes The evaluation set is composed of 100 prompts from Alpaca (Taori et al., 2023) and Share GPT2. ... In our experiment, we use the SST-2 dataset (Socher et al., 2013) and LLa MA-13b (Touvron et al., 2023a) under the in-context scenario for the binary sentiment classification task.
Dataset Splits No The paper mentions varying mask rates and evaluation metrics but does not provide specific numerical training, validation, or test dataset splits (e.g., 80/10/10 percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification No The paper mentions that the solver is "highly parallelized and implemented on GPU" in Section 3.4.3, but it does not specify any particular GPU model (e.g., NVIDIA A100, RTX 3090) or other hardware details such as CPU, memory, or cloud instance types.
Software Dependencies No The paper mentions software components like "Sinkhorn-Knopp" algorithm and "IPOT", and models like "Llama families" and "Alpaca", but it does not provide specific version numbers for these or any other key software libraries (e.g., PyTorch 1.x, Transformers 4.x).
Experiment Setup Yes To efficiently compute Eq. (3) in practice, we choose topb sequences that have the highest probabilities among all possible sequences, through the beam search decoding strategy (Wiseman & Rush, 2016). ... In most cases, we use settings of b = 10 and J = 10, meaning the top 10 nine-token sequences from beam search. ... when k is set to 100 in most of our experiments. ... Here we set the greedily generated T tokens as the output sequence {yj [1,T ]}.