Learning from My Friends: Few-Shot Personalized Conversation Systems via Social Networks
Authors: Zhiliang Tian, Wei Bi, Zihan Zhang, Dongkyu Lee, Yiping Song, Nevin L. Zhang13907-13915
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show our methods outperform all baselines in appropriateness, diversity, and consistency with speakers.The results of all competing methods on automatic metrics are shown in Table 1. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China 2Tencent AI Lab, Shenzhen, China 3National University of Defense Technology, Changsha, China 4HKUST Xiao-i Robot Joint Lab, Hong Kong SAR, China |
| Pseudocode | Yes | Algorithm 1: Training Algorithm |
| Open Source Code | No | We release the code of dataset construction 3. (The provided link github.com/tianzhiliang/Few Shot Persona Conv Data is explicitly for dataset construction, not the model's methodology code.) |
| Open Datasets | Yes | We collect the dataset from Weibo, an online chatting forum with social networks. ... We release the code of dataset construction 3. github.com/tianzhiliang/Few Shot Persona Conv Data |
| Dataset Splits | Yes | We use 28.9K speakers with 2.02M samples for training, 1K speakers with 20K samples for testing, and 0.5K speakers with 10K samples for validation. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or specific machine configurations) were mentioned for running experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names with explicit versions) were mentioned. |
| Experiment Setup | Yes | Seq2Seq follows Song et al. 2018 where the embedding and hidden dimensions are 620 and 1000. For the transformer-based model, we implement it as the original one (Vaswani et al. 2017), where the model dimension is 512, the stacked layer number is 6, and the head number is 8. ... we used SGD for the inner-loop and Adam for the outer-loop with learning rate α = 0.01 and β = 0.0003, respectively. For all methods, the batch size in training is 128. The vocabulary contains top 50k frequent tokens, and the maximum length of input queries and responses is 80. |