Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Controlling Language and Diffusion Models by Transporting Activations
Authors: Pau Rodriguez, Arno Blaas, Michal Klein, Luca Zappella, Nicholas Apostoloff, marco cuturi, Xavier Suau
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that ACT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how ACT enables fine-grained style control and concept negation. |
| Researcher Affiliation | Industry | EMAIL Apple |
| Pseudocode | No | The paper describes methods using mathematical notation and prose, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code available at https://github.com/apple/ml-act |
| Open Datasets | Yes | We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We evaluate all methods on the Truthful QA multiple choice part that has been used in prior work (Lin et al., 2021; Li et al., 2024) [...] We mine the One Sec dataset (Scarlini et al., 2019) [...] We sample 2048 prompts from the COCO Captions (Chen et al., 2015) training set |
| Dataset Splits | Yes | We prompt each LLM with 1000 randomly chosen prompts from Real Toxicity Prompts (RTP) (Gehman et al., 2020) [...] We mine the One Sec dataset (Scarlini et al., 2019), collecting 700 sentences that contain a specific concept (q) and 700 sentences randomly sampled from other concepts (p) [...] To evaluate, we sample 512 prompts from the COCO Captions validation set and generate images with different intervention strengths. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions several software components and models (e.g., ROBERTA-based classifier, Llama3-8B-instruct, Mistral-7B, Stable Diffusion XL), but does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | The degree of intervention can be controlled by a strength parameter λ between 0 (no transport) and 1 (full transport) [...] We intervene upon different layer types (layer column) and show the best layer per method [...] We use a distilled version of SDXL, which only requires 4 diffusion steps |