Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ReMoDetect: Reward Models Recognize Aligned LLM's Generations
Authors: Hyunseok Lee, Jihoon Tack, Jinwoo Shin
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an extensive evaluation by considering six text domains across twelve aligned LLMs, where our method demonstrates state-of-the-art results. Code is available at https://github.com/hyunseoklee-ai/Re Mo Detect. 4 Experiments We provide an empirical evaluation of Re Mo Detect by investigating the following questions: |
| Researcher Affiliation | Academia | Hyunseok Lee 1, Jihoon Tack ,1, Jinwoo Shin1 1Korea Advanced Institute of Science and Technology EMAIL |
| Pseudocode | No | The paper describes its methods using textual descriptions and mathematical equations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/hyunseoklee-ai/Re Mo Detect. |
| Open Datasets | Yes | HC3. HC3 is a question-and-answering dataset that consists of answers written by humans and generated by Chat GPT corresponding to the same questions. The dataset is a collection of several domains: reddit_eli5, open_qa, wiki_csai, medicine, and finance. We used training samples of 2,200 and validation samples of 1,000, which is the same subset of HC3 as the prior work [6, 40]. |
| Dataset Splits | Yes | We used training samples of 2,200 and validation samples of 1,000, which is the same subset of HC3 as the prior work [6, 40]. |
| Hardware Specification | Yes | For the main development, we mainly use Intel(R) Xeon(R) Gold 6426Y CPU @ 2.50GHz and a single A6000 48GB GPU. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' and 'nltk framework' and specific reward models like 'Open Assistant' and 'De BERTa-v3-Large', but does not provide specific version numbers for software libraries or environments required for full reproducibility. |
| Experiment Setup | Yes | We use Adam W optimizer with a learning rate of 2.0 10 5 with 10% warm up and cosine decay and train it for one epoch. For the λ constant for regularization using replay buffer, we used λ = 0.01. For the β1, β2 parameters that chooses the contribution of the mixed data, we used 0.3 and 0.3. |