ReMoDetect: Reward Models Recognize Aligned LLM's Generations

Authors: Hyunseok Lee, Jihoon Tack, Jinwoo Shin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide an extensive evaluation by considering six text domains across twelve aligned LLMs, where our method demonstrates state-of-the-art results. Code is available at https://github.com/hyunseoklee-ai/Re Mo Detect. 4 Experiments We provide an empirical evaluation of Re Mo Detect by investigating the following questions:
Researcher Affiliation Academia Hyunseok Lee 1, Jihoon Tack ,1, Jinwoo Shin1 1Korea Advanced Institute of Science and Technology {hs.lee,jihoontack,jinwoos}@kaist.ac.kr
Pseudocode No The paper describes its methods using textual descriptions and mathematical equations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/hyunseoklee-ai/Re Mo Detect.
Open Datasets Yes HC3. HC3 is a question-and-answering dataset that consists of answers written by humans and generated by Chat GPT corresponding to the same questions. The dataset is a collection of several domains: reddit_eli5, open_qa, wiki_csai, medicine, and finance. We used training samples of 2,200 and validation samples of 1,000, which is the same subset of HC3 as the prior work [6, 40].
Dataset Splits Yes We used training samples of 2,200 and validation samples of 1,000, which is the same subset of HC3 as the prior work [6, 40].
Hardware Specification Yes For the main development, we mainly use Intel(R) Xeon(R) Gold 6426Y CPU @ 2.50GHz and a single A6000 48GB GPU.
Software Dependencies No The paper mentions using 'Adam W optimizer' and 'nltk framework' and specific reward models like 'Open Assistant' and 'De BERTa-v3-Large', but does not provide specific version numbers for software libraries or environments required for full reproducibility.
Experiment Setup Yes We use Adam W optimizer with a learning rate of 2.0 10 5 with 10% warm up and cosine decay and train it for one epoch. For the λ constant for regularization using replay buffer, we used λ = 0.01. For the β1, β2 parameters that chooses the contribution of the mixed data, we used 0.3 and 0.3.