Passer à la navigation principale Passer à la recherche Passer au contenu principal

CONTROLLING LANGUAGE AND DIFFUSION MODELS BY TRANSPORTING ACTIVATIONS

  • Pau Rodríguez
  • , Arno Blaas
  • , Michal Klein
  • , Luca Zappella
  • , Nicholas Apostoloff
  • , Marco Cuturi
  • , Xavier Suau
  • Apple Computer

Résultats de recherche: Le chapitre dans un livre, un rapport, une anthologie ou une collectionContribution à une conférenceRevue par des pairs

Résumé

The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control model generation by steering model activations in order to effectively induce or prevent the emergence of concepts or behaviors in the generated output. In this paper we introduce Activation Transport (ACT), a general framework to steer activations guided by optimal transport theory that generalizes many previous activation-steering works. ACT is modality-agnostic and provides fine-grained control over the model behavior with negligible computational overhead, while minimally impacting model abilities. We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that ACT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how ACT enables fine-grained style control and concept negation.

langue originaleAnglais
titre13th International Conference on Learning Representations, ICLR 2025
EditeurInternational Conference on Learning Representations, ICLR
Pages53912-53955
Nombre de pages44
ISBN (Electronique)9798331320850
étatPublié - 1 janv. 2025
Modification externeOui
Evénement13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapour
Durée: 24 avr. 202528 avr. 2025

Série de publications

Nom13th International Conference on Learning Representations, ICLR 2025

Une conférence

Une conférence13th International Conference on Learning Representations, ICLR 2025
Pays/TerritoireSingapour
La villeSingapore
période24/04/2528/04/25

Empreinte digitale

Examiner les sujets de recherche de « CONTROLLING LANGUAGE AND DIFFUSION MODELS BY TRANSPORTING ACTIVATIONS ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation