CultureLLM: Incorporating Cultural Differences into Large Language Models

Authors: Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, Xing Xie

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on 60 culture-related datasets reveal that Culture LLM significantly surpasses various counterparts such as GPT-3.5 (by 8.1%) and Gemini Pro (by 9.5%), demonstrating performance comparable to or exceeding that of GPT-4.
Researcher Affiliation Collaboration Cheng Li Institute of Software, CAS chenglicat0228@gmail.com Mengzhuo Chen Institute of Software, CAS mengzhuo.happy@gmail.com Jindong Wang Microsoft Research jindong.wang@microsoft.com Sunayana Sitaram Microsoft Research Sunayana.Sitaram@microsoft.com Xing Xie Microsoft Research xing.xie@microsoft.com
Pseudocode No The paper includes a diagram (Figure 2) illustrating the data augmentation process and describes steps in text, but it does not present structured pseudocode or algorithm blocks.
Open Source Code Yes Code is released at https://github.com/Scarelette/Culture LLM.
Open Datasets Yes We adopt culture-related public datasets in specific languages for evaluation. In total, we have 59 test sets, covering 9 languages and containing 68, 607 test samples. Details are shown in Appendix B.2. ... We use the World Values Survey (WVS) [Survey, 2022] as seed data.
Dataset Splits No The paper mentions '59 test sets' and describes fine-tuning data, but does not explicitly provide the training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification No The paper mentions fine-tuning on OpenAI APIs and Llama-2-70b-chat but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for their experiments.
Software Dependencies Yes We fine-tune Culture LLM using the GPT-3.5 (0613) [Open AI, 2023a] fine-tuning API...compared with two state-of-the-art LLMs, namely Gemini pro [Google, 2023] and GPT-4 (1104) [Open AI, 2023b]...fine-tuned Culture LLM using Llama-2-70b-chat [Touvron et al., 2023]...We use Lora [Hu et al., 2021] to fine-tune Llama-70b-Chat.
Experiment Setup Yes The setting for Lora are list below: lora_alpha: 16 lora_dropout: 0.1 task_type: CAUSAL_LM The detailed setting for training are list below: num_train_epochs: 6 per_device_train_batch_size: 4 gradient_accumulation_steps: 1 optim: paged_adamw_32bit learning_rate: 2e-4 weight_decay: 0.001 fp16: False bf16: False max_grad_norm: 0.3 max_steps: -1 warmup_ratio: 0.03 group_by_length: True lr_scheduler_type: constant report_to: tensorboard