Adaptively-tuned Llama-3.2-3B Paraphraser
This model is an adaptively fine-tuned version of Qwen2.5-3B-Instruct optimized to evade the Unigram watermarking method while preserving text quality. It serves as a paraphrasing model that maintains semantic meaning while modifying the statistical patterns used for watermark detection.
Model Details
Model Description
This model is a fine-tuned version of Qwen2.5-3B-Instruct that has been optimized using Direct Preference Optimization (DPO) to evade the Unigram watermarking method described in Zhao et. al (2023). The model preserves text quality while modifying the statistical patterns that watermarking methods rely on for detection.
- Model type: Decoder-only transformer language model
- Language(s): English
- Finetuned from model: meta-llama/Llama-3.2-3B-Instruct
Get Started
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
# Load the base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "DDiaa/WM-Removal-Unigram-Llama-3.2-3B")
# Prepare the prompt
system_prompt = (
"You are an expert copy-editor. Please rewrite the following text in your own voice and paraphrase all "
"sentences.\n Ensure that the final output contains the same information as the original text and has "
"roughly the same length.\n Do not leave out any important details when rewriting in your own voice. Do "
"not include any information that is not present in the original text. Do not respond with a greeting or "
"any other extraneous information. Skip the preamble. Just rewrite the text directly."
)
def paraphrase_text(text):
# Prepare prompt
prompt = tokenizer.apply_chat_template(
[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"\n[[START OF TEXT]]\n{text}\n[[END OF TEXT]]"},
],
tokenize=False,
add_generation_prompt=True,
) + "[[START OF PARAPHRASE]]\n"
# Generate paraphrase
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=1.0,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
# Post-process output
paraphrased = tokenizer.decode(outputs[0], skip_special_tokens=True)
paraphrased = paraphrased.split("[[START OF PARAPHRASE]]")[1].split("[[END OF")[0].strip()
return paraphrased
Uses
Direct Use
The model is designed for research purposes to:
- Study the robustness of watermarking methods
- Evaluate the effectiveness of adaptive attacks against content watermarks
- Test and develop improved watermarking techniques
Downstream Use
The model can be integrated into:
- Watermark robustness evaluation pipelines
- Research frameworks studying language model security
- Benchmark suites for watermarking methods
Out-of-Scope Use
This model should not be used for:
- Production environments requiring watermark compliance
- Generating deceptive or misleading content
- Evading legitimate content attribution systems
- Any malicious purposes that could harm individuals or society
Bias, Risks, and Limitations
- The model inherits biases from the base Qwen2.5-3B-Instruct model
- Performance varies based on text length and complexity
- Evasion capabilities may be reduced against newer watermarking methods
- May occasionally produce lower quality outputs compared to the base model
- Limited to English language texts
Recommendations
- Use only for research and evaluation purposes
- Always maintain proper content attribution
- Monitor output quality metrics
- Consider ethical implications when studying security measures
- Use in conjunction with other evaluation methods
Citation
BibTeX:
@article{diaa2024optimizing,
title={Optimizing adaptive attacks against content watermarks for language models},
author={Diaa, Abdulrahman and Aremu, Toluwani and Lukas, Nils},
journal={arXiv preprint arXiv:2410.02440},
year={2024}
}
Model Card Contact
For questions about this model, please file an issue on the GitHub repository: https://github.com/ML-Watermarking/ada-llm-wm
- Downloads last month
- 6
Model tree for DDiaa/WM-Removal-Unigram-Llama-3.2-3B
Base model
meta-llama/Llama-3.2-3B-Instruct