CultureLLM: Incorporating Cultural Differences into Large Language Models
Authors: Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, Xing Xie
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on 60 culture-related datasets reveal that Culture LLM significantly surpasses various counterparts such as GPT-3.5 (by 8.1%) and Gemini Pro (by 9.5%), demonstrating performance comparable to or exceeding that of GPT-4. |
| Researcher Affiliation | Collaboration | Cheng Li Institute of Software, CAS chenglicat0228@gmail.com Mengzhuo Chen Institute of Software, CAS mengzhuo.happy@gmail.com Jindong Wang Microsoft Research jindong.wang@microsoft.com Sunayana Sitaram Microsoft Research Sunayana.Sitaram@microsoft.com Xing Xie Microsoft Research xing.xie@microsoft.com |
| Pseudocode | No | The paper includes a diagram (Figure 2) illustrating the data augmentation process and describes steps in text, but it does not present structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is released at https://github.com/Scarelette/Culture LLM. |
| Open Datasets | Yes | We adopt culture-related public datasets in specific languages for evaluation. In total, we have 59 test sets, covering 9 languages and containing 68, 607 test samples. Details are shown in Appendix B.2. ... We use the World Values Survey (WVS) [Survey, 2022] as seed data. |
| Dataset Splits | No | The paper mentions '59 test sets' and describes fine-tuning data, but does not explicitly provide the training/validation/test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper mentions fine-tuning on OpenAI APIs and Llama-2-70b-chat but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for their experiments. |
| Software Dependencies | Yes | We fine-tune Culture LLM using the GPT-3.5 (0613) [Open AI, 2023a] fine-tuning API...compared with two state-of-the-art LLMs, namely Gemini pro [Google, 2023] and GPT-4 (1104) [Open AI, 2023b]...fine-tuned Culture LLM using Llama-2-70b-chat [Touvron et al., 2023]...We use Lora [Hu et al., 2021] to fine-tune Llama-70b-Chat. |
| Experiment Setup | Yes | The setting for Lora are list below: lora_alpha: 16 lora_dropout: 0.1 task_type: CAUSAL_LM The detailed setting for training are list below: num_train_epochs: 6 per_device_train_batch_size: 4 gradient_accumulation_steps: 1 optim: paged_adamw_32bit learning_rate: 2e-4 weight_decay: 0.001 fp16: False bf16: False max_grad_norm: 0.3 max_steps: -1 warmup_ratio: 0.03 group_by_length: True lr_scheduler_type: constant report_to: tensorboard |