reproducibilityindex.ai

StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses

Authors: JIANAN LI, Quan Tu, Cunli Mao, Zhengtao Yu, Ji-Rong Wen, Rui Yan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method outperforms strong baselines in dialogue tasks and achieves a 4 speedup while reducing memory usage by 18 compared to dense attention recomputation. We conduct experiments on Persona Chat [35], Multi-Session Chat (MSC) [36], Topical-Chat [37] and Multi WOZ [38] datasets.
Researcher Affiliation	Academia	Jia-Nan Li1 Quan Tu1 Cunli Mao2 Zhengtao Yu2 Ji-Rong Wen1 Rui Yan1 1 Gaoling School of Artificial Intelligence, Renmin University of China 2 Kunming University of Science and Technology {lijianan, quantu, jrwen, ruiyan}@ruc.edu.cn maocunli@163.com, ztyu@hotmail.com
Pseudocode	No	The paper describes methods using natural language and mathematical equations (e.g., attention mask definitions) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures with structured code-like steps.
Open Source Code	Yes	Code: https://github.com/Jina Leejnl/Streaming Dialogue
Open Datasets	Yes	We conduct experiments on Persona Chat [35], Multi-Session Chat (MSC) [36], Topical-Chat [37] and Multi WOZ [38] datasets.
Dataset Splits	No	Appendix A, Table 5 provides details on 'Train' and 'Test' utterance counts and average lengths for the datasets used. While it clearly defines training and testing splits, it does not explicitly mention or provide details for a 'validation' dataset split.
Hardware Specification	Yes	Figure 5 depicts the average per-token latency and memory usage during dialogue generation with NVIDIA A100 GPU using various methods. The SMR & LMR phase requires about 2 hours on two A100-40G GPUs. Dialogue generation takes only about 15 minutes on a single A100-40G GPU.
Software Dependencies	No	The paper mentions using Llama-2-7B, Llama-2-7B-Chat, Llama-3-8B-Instruct, and Mistral-7B models, but it does not specify exact version numbers for any software dependencies like Python, PyTorch, TensorFlow, or specific libraries used for implementation.
Experiment Setup	Yes	We investigate the impact of two hyper-parameters in our method: the number of utterances in SMR samples (s) and the number of query-response pairs in LMR samples (l), both ranging from {8, 12, 16, 20, 24, 28, 32}. We only train the attention layer for 1 epoch, with the learning rate set to 5e-5, utilizing cosine annealing for adjusting the learning rate, and setting the warm-up step to 0. All models fine-tune only the attention layer for 2 epochs, with the learning rate set to 5e-5, utilizing cosine annealing for adjusting the learning rate, and setting the warm-up step to 0.