Adaptively-tuned Qwen2.5-3B Paraphraser

This model is an adaptively fine-tuned version of Qwen2.5-3B-Instruct optimized to evade the Unigram watermarking method while preserving text quality. It serves as a paraphrasing model that maintains semantic meaning while modifying the statistical patterns used for watermark detection.

Model Details

Model Description

This model is a fine-tuned version of Qwen2.5-3B-Instruct that has been optimized using Direct Preference Optimization (DPO) to evade the Unigram watermarking method described in Zhao et. al (2023). The model preserves text quality while modifying the statistical patterns that watermarking methods rely on for detection.

  • Model type: Decoder-only transformer language model
  • Language(s): English
  • Finetuned from model: Qwen/Qwen2.5-3B-Instruct

Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Load the base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "DDiaa/WM-Removal-Unigram-Qwen2.5-3B")

# Prepare the prompt

system_prompt = (
    "You are an expert copy-editor. Please rewrite the following text in your own voice and paraphrase all "
    "sentences.\n Ensure that the final output contains the same information as the original text and has "
    "roughly the same length.\n Do not leave out any important details when rewriting in your own voice. Do "
    "not include any information that is not present in the original text. Do not respond with a greeting or "
    "any other extraneous information. Skip the preamble. Just rewrite the text directly."
)

def paraphrase_text(text):
    # Prepare prompt
    prompt = tokenizer.apply_chat_template(
        [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"\n[[START OF TEXT]]\n{text}\n[[END OF TEXT]]"},
        ],
        tokenize=False,
        add_generation_prompt=True,
    ) + "[[START OF PARAPHRASE]]\n"
    
    # Generate paraphrase
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=1.0,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )
    
    # Post-process output
    paraphrased = tokenizer.decode(outputs[0], skip_special_tokens=True)
    paraphrased = paraphrased.split("[[START OF PARAPHRASE]]")[1].split("[[END OF")[0].strip()
    
    return paraphrased

Uses

Direct Use

The model is designed for research purposes to:

  1. Study the robustness of watermarking methods
  2. Evaluate the effectiveness of adaptive attacks against content watermarks
  3. Test and develop improved watermarking techniques

Downstream Use

The model can be integrated into:

  • Watermark robustness evaluation pipelines
  • Research frameworks studying language model security
  • Benchmark suites for watermarking methods

Out-of-Scope Use

This model should not be used for:

  • Production environments requiring watermark compliance
  • Generating deceptive or misleading content
  • Evading legitimate content attribution systems
  • Any malicious purposes that could harm individuals or society

Bias, Risks, and Limitations

  • The model inherits biases from the base Qwen2.5-3B-Instruct model
  • Performance varies based on text length and complexity
  • Evasion capabilities may be reduced against newer watermarking methods
  • May occasionally produce lower quality outputs compared to the base model
  • Limited to English language texts

Recommendations

  • Use only for research and evaluation purposes
  • Always maintain proper content attribution
  • Monitor output quality metrics
  • Consider ethical implications when studying security measures
  • Use in conjunction with other evaluation methods

Citation

BibTeX:

@article{diaa2024optimizing,
  title={Optimizing adaptive attacks against content watermarks for language models},
  author={Diaa, Abdulrahman and Aremu, Toluwani and Lukas, Nils},
  journal={arXiv preprint arXiv:2410.02440},
  year={2024}
}

Model Card Contact

For questions about this model, please file an issue on the GitHub repository: https://github.com/ML-Watermarking/ada-llm-wm

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for DDiaa/WM-Removal-Unigram-Qwen2.5-3B

Base model

Qwen/Qwen2.5-3B
Adapter
(347)
this model

Collection including DDiaa/WM-Removal-Unigram-Qwen2.5-3B