ReMoDetect: Reward Models Recognize Aligned LLM's Generations
Authors: Hyunseok Lee, Jihoon Tack, Jinwoo Shin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an extensive evaluation by considering six text domains across twelve aligned LLMs, where our method demonstrates state-of-the-art results. Code is available at https://github.com/hyunseoklee-ai/Re Mo Detect. 4 Experiments We provide an empirical evaluation of Re Mo Detect by investigating the following questions: |
| Researcher Affiliation | Academia | Hyunseok Lee 1, Jihoon Tack ,1, Jinwoo Shin1 1Korea Advanced Institute of Science and Technology {hs.lee,jihoontack,jinwoos}@kaist.ac.kr |
| Pseudocode | No | The paper describes its methods using textual descriptions and mathematical equations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/hyunseoklee-ai/Re Mo Detect. |
| Open Datasets | Yes | HC3. HC3 is a question-and-answering dataset that consists of answers written by humans and generated by Chat GPT corresponding to the same questions. The dataset is a collection of several domains: reddit_eli5, open_qa, wiki_csai, medicine, and finance. We used training samples of 2,200 and validation samples of 1,000, which is the same subset of HC3 as the prior work [6, 40]. |
| Dataset Splits | Yes | We used training samples of 2,200 and validation samples of 1,000, which is the same subset of HC3 as the prior work [6, 40]. |
| Hardware Specification | Yes | For the main development, we mainly use Intel(R) Xeon(R) Gold 6426Y CPU @ 2.50GHz and a single A6000 48GB GPU. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' and 'nltk framework' and specific reward models like 'Open Assistant' and 'De BERTa-v3-Large', but does not provide specific version numbers for software libraries or environments required for full reproducibility. |
| Experiment Setup | Yes | We use Adam W optimizer with a learning rate of 2.0 10 5 with 10% warm up and cosine decay and train it for one epoch. For the λ constant for regularization using replay buffer, we used λ = 0.01. For the β1, β2 parameters that chooses the contribution of the mixed data, we used 0.3 and 0.3. |