Prefix tuning

Prefix tuning prefixes a series of task-specific vectors to the input sequence that can be learned while keeping the pretrained model frozen. The prefix parameters are inserted in all of the model layers.

Initialize from a KV cache prefix

By default, prefix tuning is randomly initialized.

PEFT also provides utilities to initialize a prefix-tuning adapter from an existing KV cache prefix (for example, from the first p tokens of a prompt/corpus). This is only supported when prefix_projection=False (the default), because in that case the learned parameters are the KV prefix itself.

from transformers import AutoModelForCausalLM, AutoTokenizer

from peft import PrefixTuningConfig, get_peft_model, initialize_kv_prefix_from_text

base = AutoModelForCausalLM.from_pretrained("gpt2")
tok = AutoTokenizer.from_pretrained("gpt2")

peft_cfg = PrefixTuningConfig(task_type="CAUSAL_LM", num_virtual_tokens=20, prefix_projection=False)
model = get_peft_model(base, peft_cfg)

initialize_kv_prefix_from_text(
    model,
    tok,
    text="...a long context with at least num_virtual_tokens tokens...",
    use_chat_template=False,
)

Make sure the text is long enough to produce at least num_virtual_tokens tokens, otherwise initialization will fail.

The abstract from the paper is:

Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were “virtual tokens”. We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training.

PrefixTuningConfig

class peft.PrefixTuningConfig

< source >

( task_type: Optional[Union[str, TaskType]] = None peft_type: Optional[Union[str, PeftType]] = None auto_mapping: Optional[dict] = None peft_version: Optional[str] = None base_model_name_or_path: Optional[str] = None revision: Optional[str] = None inference_mode: bool = False num_virtual_tokens: int = None token_dim: int = None num_transformer_submodules: Optional[int] = None num_attention_heads: Optional[int] = None num_layers: Optional[int] = None modules_to_save: Optional[list[str]] = None encoder_hidden_size: int = None prefix_projection: bool = False )

Parameters

encoder_hidden_size (int) — The hidden size of the prompt encoder.
prefix_projection (bool) — Whether to project the prefix embeddings.

This is the configuration class to store the configuration of a PrefixEncoder.

PrefixEncoder

class peft.PrefixEncoder

< source >

( config )

Parameters

config (PrefixTuningConfig) — The configuration of the prefix encoder.

The torch.nn model to encode the prefix.

Example:

>>> from peft import PrefixEncoder, PrefixTuningConfig

>>> config = PrefixTuningConfig(
...     peft_type="PREFIX_TUNING",
...     task_type="SEQ_2_SEQ_LM",
...     num_virtual_tokens=20,
...     token_dim=768,
...     num_transformer_submodules=1,
...     num_attention_heads=12,
...     num_layers=12,
...     encoder_hidden_size=768,
... )
>>> prefix_encoder = PrefixEncoder(config)

Attributes:

embedding (torch.nn.Embedding) — The embedding layer of the prefix encoder.
transform (torch.nn.Sequential) — The two-layer MLP to transform the prefix embeddings if prefix_projection is True.
prefix_projection (bool) — Whether to project the prefix embeddings.

Input shape: (batch_size, num_virtual_tokens)

Output shape: (batch_size, num_virtual_tokens, 2*layers*hidden)

load_prompt_embeddings

< source >

( prompt_embeddings: Tensor )

Load the flattened prompt embeddings saved by PEFT (prompt_embeddings).

For prefix tuning, this is only supported when prefix_projection=False, because in that case the learned parameters are the KV prefix itself (embedding.weight has shape [num_virtual_tokens, num_layers*2*token_dim]).

If prefix_projection=True, the parameters are (virtual token embeddings + an MLP) and there is no general way to invert the projection to recover those parameters from a flattened KV prefix.

Update on GitHub