Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing
Authors: Letian Peng, Jingbo Shang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments offer a fine-grained and explainable comparison between existing PRP techniques, revealing their advantages and limitations. We evaluate potential PRP techniques, EU, RAG, and LCM by APC score, which reveals their properties on active and passive constraints. |
| Researcher Affiliation | Academia | Letian Peng, Jingbo Shang Department of Computer Science University of California, San Diego {lepeng, jshang}@ucsd.edu |
| Pseudocode | No | The paper describes its methods using mathematical formulations and prose, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code, Dataset, Demo: https://github.com/Komeiji Force/Active_Passive_Constraint_Koishiday_2024 |
| Open Datasets | Yes | Code, Dataset, Demo: https://github.com/Komeiji Force/Active_Passive_Constraint_Koishiday_2024 and The input information (persona, query, response) is also generated by prompting GPT-4 based on 3 characters (Beethoven, Newton, Socrates) with many persona statements from Character-LLM. We got 8.4K data for statement-query relevance and 18.9K data for statement-to-response NLI, which are used to fine-tune a state-of-the-art discriminator De BERTa-V3... |
| Dataset Splits | Yes | We use 80%/20% train/test split and observe a high ( 90%) accuracy referencing GPT-4 s labels... |
| Hardware Specification | No | The paper mentions performance in 'it/s' for different models (e.g., 'De BERTa (Large) ... 150.8it/s'), implying computational resources were used, but it does not specify the exact hardware components such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions specific models and techniques like 'GPT-4', 'De BERTa-V3', 'Gemma-1.1-7B-it', 'Lo RA', and 'Adam W', but does not provide explicit version numbers for these software components or libraries. |
| Experiment Setup | Yes | Different fine-tuning procedures for Gemma share the same set of hyperparameters. 128-rank Lo RA is used to fine-tune the model with Adam W (Loshchilov & Hutter, 2019) as the optimizer, learning rate initialized as 2 10 4. Based on the number of persona statements, EU for original characters fine-tunes for 20 epochs, while for famous figures fine-tunes for 5 epochs. DPO fine-tunes for 10 epochs for all characters. |