Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Authors: Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A broad range of experiments demonstrate the effectiveness and merits of our algorithms. We conduct extensive experiments to demonstrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Academia | University of Pennsylvania. EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 MOCAN: Model-based Constrained Alignment via dualizatio N. Algorithm 2 PECAN: Preference-based Constrained Alignment via dualizatio N. Algorithm 3 PECAN with varying KL regularization in pre-alignment. |
| Open Source Code | Yes | The source code is available here.2 [footnote 2: https://github.com/shuoli90/CAN] |
| Open Datasets | Yes | We use the PKU-Safe RLHF-30K preference dataset [20] |
| Dataset Splits | Yes | We use the PKU-Safe RLHF-30K preference dataset [20], which contains approximately 27,000 training and 3,000 testing expert evaluations. |
| Hardware Specification | Yes | In practice, our experiments are conducted on a single 48G NVIDIA A6000 GPU |
| Software Dependencies | No | The paper mentions software components like 'PEFT strategy Lo RA' and 'Optimizer paged_adamw_32bit' in its hyperparameters table, but it does not specify specific version numbers for these or other key software dependencies like Python, PyTorch, or any underlying libraries used for implementation. |
| Experiment Setup | Yes | See Tables 1, 2, and 3 for the training-related hyper-parameters. |