Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Authors: Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, YU LI, Shaohui Liu, Zhuotao Tian
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that CAHL significantly enhances LLM robustness against both conventional attacks and the proposed TCA, exhibiting strong generalization capabilities in zero-shot evaluations while still preserving model performance on generic tasks. |
| Researcher Affiliation | Academia | 1Harbin Institute of Technology (Shenzhen) 2Harbin Institute of Technology 3Great Bay University 4Zhejiang University EMAIL |
| Pseudocode | No | The paper describes the method using textual explanations and mathematical formulations, for example, in Section 4.3 'Context-Aware Hierarchical Learning', but does not include any clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Our code is available at https://github.com/S2AILab/CAHL. |
| Open Datasets | Yes | To comprehensively evaluate the vulnerabilities of modern LLMs under the TCA mechanism, we instantiate TCA examples by strategically adapting from BUTTON [5], an instruction-tuning dataset specifically designed for function calling. |
| Dataset Splits | Yes | For evaluation, consistent with methodologies in previous works, we employ a test set comprising 805 benign samples and 208 adversarial samples from Alpaca Farm [12]. |
| Hardware Specification | Yes | enabling complete training on a single NVIDIA A100-80G GPU. |
| Software Dependencies | No | Computational efficiency is optimized using Flash Attention-2 [9] to accelerate attention computation, combined with 8-bit quantized Adam W [11] and gradient checkpointing, enabling complete training on a single NVIDIA A100-80G GPU. |
| Experiment Setup | Yes | All models undergo full-parameter supervised fine-tuning (SFT) with a next-token prediction (NTP) loss for three epochs under a cosine learning rate schedule with an initial learning rate of 2e-5. |