Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing

Authors: Letian Peng, Jingbo Shang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments offer a fine-grained and explainable comparison between existing PRP techniques, revealing their advantages and limitations. We evaluate potential PRP techniques, EU, RAG, and LCM by APC score, which reveals their properties on active and passive constraints.
Researcher Affiliation Academia Letian Peng, Jingbo Shang Department of Computer Science University of California, San Diego {lepeng, jshang}@ucsd.edu
Pseudocode No The paper describes its methods using mathematical formulations and prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code, Dataset, Demo: https://github.com/Komeiji Force/Active_Passive_Constraint_Koishiday_2024
Open Datasets Yes Code, Dataset, Demo: https://github.com/Komeiji Force/Active_Passive_Constraint_Koishiday_2024 and The input information (persona, query, response) is also generated by prompting GPT-4 based on 3 characters (Beethoven, Newton, Socrates) with many persona statements from Character-LLM. We got 8.4K data for statement-query relevance and 18.9K data for statement-to-response NLI, which are used to fine-tune a state-of-the-art discriminator De BERTa-V3...
Dataset Splits Yes We use 80%/20% train/test split and observe a high ( 90%) accuracy referencing GPT-4 s labels...
Hardware Specification No The paper mentions performance in 'it/s' for different models (e.g., 'De BERTa (Large) ... 150.8it/s'), implying computational resources were used, but it does not specify the exact hardware components such as specific GPU or CPU models.
Software Dependencies No The paper mentions specific models and techniques like 'GPT-4', 'De BERTa-V3', 'Gemma-1.1-7B-it', 'Lo RA', and 'Adam W', but does not provide explicit version numbers for these software components or libraries.
Experiment Setup Yes Different fine-tuning procedures for Gemma share the same set of hyperparameters. 128-rank Lo RA is used to fine-tune the model with Adam W (Loshchilov & Hutter, 2019) as the optimizer, learning rate initialized as 2 10 4. Based on the number of persona statements, EU for original characters fine-tunes for 20 epochs, while for famous figures fine-tunes for 5 epochs. DPO fine-tunes for 10 epochs for all characters.