Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CulturePark: Boosting Cross-cultural Understanding in Large Language Models
Authors: Cheng Li, Damien Teney, Linyi Yang, Qingsong Wen, Xing Xie, Jindong Wang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education. |
| Researcher Affiliation | Collaboration | Cheng Li Institute of Software, CAS EMAIL Damien Teney Idiap Research Institute Linyi Yang Westlake University EMAIL Qingsong Wen Squirrel AI EMAIL Xing Xie Microsoft Research EMAIL Jindong Wang William & Mary EMAIL |
| Pseudocode | Yes | Figure 9: Pipeline of data refinement. |
| Open Source Code | Yes | Code is released at https://github. com/Scarelette/Culture Park. |
| Open Datasets | Yes | The seed questions initiating the communication have two sources: World Values Survey (WVS) [Survey, 2022b] and Global Attitudes surveys (GAS) from Pew Research Center [Survey, 2022a]. |
| Dataset Splits | No | The paper mentions 41k samples used for fine-tuning and a test set for evaluation, but does not explicitly provide training/validation/test splits for the fine-tuning data or the source datasets. |
| Hardware Specification | No | The paper mentions using GPT-3.5-Turbo and fine-tuning Llama-2-70b models but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for these operations. |
| Software Dependencies | No | The paper mentions using "text-embedding-3-small" and "K-means" but does not provide specific version numbers for key software components or libraries required for replication, nor a comprehensive list of dependencies. |
| Experiment Setup | Yes | Hyperparameters are shown in Table 6. Table 6: Details on Fine-tuning GPT-3.5-turbo using Open AI API. Model [various] Epochs [various numbers]. |