reproducibilityindex.ai

Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing

Authors: Letian Peng, Jingbo Shang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments offer a fine-grained and explainable comparison between existing PRP techniques, revealing their advantages and limitations. We evaluate potential PRP techniques, EU, RAG, and LCM by APC score, which reveals their properties on active and passive constraints.
Researcher Affiliation	Academia	Letian Peng, Jingbo Shang Department of Computer Science University of California, San Diego {lepeng, jshang}@ucsd.edu
Pseudocode	No	The paper describes its methods using mathematical formulations and prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code, Dataset, Demo: https://github.com/Komeiji Force/Active_Passive_Constraint_Koishiday_2024
Open Datasets	Yes	Code, Dataset, Demo: https://github.com/Komeiji Force/Active_Passive_Constraint_Koishiday_2024 and The input information (persona, query, response) is also generated by prompting GPT-4 based on 3 characters (Beethoven, Newton, Socrates) with many persona statements from Character-LLM. We got 8.4K data for statement-query relevance and 18.9K data for statement-to-response NLI, which are used to fine-tune a state-of-the-art discriminator De BERTa-V3...
Dataset Splits	Yes	We use 80%/20% train/test split and observe a high ( 90%) accuracy referencing GPT-4 s labels...
Hardware Specification	No	The paper mentions performance in 'it/s' for different models (e.g., 'De BERTa (Large) ... 150.8it/s'), implying computational resources were used, but it does not specify the exact hardware components such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions specific models and techniques like 'GPT-4', 'De BERTa-V3', 'Gemma-1.1-7B-it', 'Lo RA', and 'Adam W', but does not provide explicit version numbers for these software components or libraries.
Experiment Setup	Yes	Different fine-tuning procedures for Gemma share the same set of hyperparameters. 128-rank Lo RA is used to fine-tune the model with Adam W (Loshchilov & Hutter, 2019) as the optimizer, learning rate initialized as 2 10 4. Based on the number of persona statements, EU for original characters fine-tunes for 20 epochs, while for famous figures fine-tunes for 5 epochs. DPO fine-tunes for 10 epochs for all characters.