Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,132 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: meta-llama/Llama-2-7b-chat-hf
|
3 |
+
library_name: peft
|
4 |
+
license: apache-2.0
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
---
|
9 |
+
|
10 |
+
# Adaptively-tuned Llama-2-7B Paraphraser
|
11 |
+
|
12 |
+
This model is an adaptively fine-tuned version of Llama-2-7B-Instruct optimized to evade the EXP watermarking method while preserving text quality. It serves as a paraphrasing model that maintains semantic meaning while modifying the statistical patterns used for watermark detection.
|
13 |
+
|
14 |
+
## Model Details
|
15 |
+
|
16 |
+
### Model Description
|
17 |
+
|
18 |
+
This model is a fine-tuned version of Llama-2-7B-Instruct that has been optimized using Direct Preference Optimization (DPO) to evade the EXP watermarking method described in Aaronson and Kirchner (2023). The model preserves text quality while modifying the statistical patterns that watermarking methods rely on for detection.
|
19 |
+
|
20 |
+
- **Model type:** Decoder-only transformer language model
|
21 |
+
- **Language(s):** English
|
22 |
+
- **Finetuned from model:** Llama-2-7B-Instruct
|
23 |
+
|
24 |
+
## Get Started
|
25 |
+
|
26 |
+
```python
|
27 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
28 |
+
from peft import PeftModel, PeftConfig
|
29 |
+
|
30 |
+
# Load the base model
|
31 |
+
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7B-Instruct")
|
32 |
+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7B-Instruct")
|
33 |
+
|
34 |
+
# Load the LoRA adapter
|
35 |
+
model = PeftModel.from_pretrained(model, "DDiaa/EXP-Llama-2-7B")
|
36 |
+
|
37 |
+
# Prepare the prompt
|
38 |
+
|
39 |
+
system_prompt = (
|
40 |
+
"You are an expert copy-editor. Please rewrite the following text in your own voice and paraphrase all "
|
41 |
+
"sentences.\n Ensure that the final output contains the same information as the original text and has "
|
42 |
+
"roughly the same length.\n Do not leave out any important details when rewriting in your own voice. Do "
|
43 |
+
"not include any information that is not present in the original text. Do not respond with a greeting or "
|
44 |
+
"any other extraneous information. Skip the preamble. Just rewrite the text directly."
|
45 |
+
)
|
46 |
+
|
47 |
+
def paraphrase_text(text):
|
48 |
+
# Prepare prompt
|
49 |
+
prompt = tokenizer.apply_chat_template(
|
50 |
+
[
|
51 |
+
{"role": "system", "content": system_prompt},
|
52 |
+
{"role": "user", "content": f"\n[[START OF TEXT]]\n{text}\n[[END OF TEXT]]"},
|
53 |
+
],
|
54 |
+
tokenize=False,
|
55 |
+
add_generation_prompt=True,
|
56 |
+
) + "[[START OF PARAPHRASE]]\n"
|
57 |
+
|
58 |
+
# Generate paraphrase
|
59 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
60 |
+
outputs = model.generate(
|
61 |
+
**inputs,
|
62 |
+
max_new_tokens=512,
|
63 |
+
temperature=1.0,
|
64 |
+
do_sample=True,
|
65 |
+
pad_token_id=tokenizer.pad_token_id
|
66 |
+
)
|
67 |
+
|
68 |
+
# Post-process output
|
69 |
+
paraphrased = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
70 |
+
paraphrased = paraphrased.split("[[START OF PARAPHRASE]]")[1].split("[[END OF")[0].strip()
|
71 |
+
|
72 |
+
return paraphrased
|
73 |
+
```
|
74 |
+
|
75 |
+
|
76 |
+
## Uses
|
77 |
+
|
78 |
+
### Direct Use
|
79 |
+
|
80 |
+
The model is designed for research purposes to:
|
81 |
+
1. Study the robustness of watermarking methods
|
82 |
+
2. Evaluate the effectiveness of adaptive attacks against content watermarks
|
83 |
+
3. Test and develop improved watermarking techniques
|
84 |
+
|
85 |
+
### Downstream Use
|
86 |
+
|
87 |
+
The model can be integrated into:
|
88 |
+
- Watermark robustness evaluation pipelines
|
89 |
+
- Research frameworks studying language model security
|
90 |
+
- Benchmark suites for watermarking methods
|
91 |
+
|
92 |
+
### Out-of-Scope Use
|
93 |
+
|
94 |
+
This model should not be used for:
|
95 |
+
- Production environments requiring watermark compliance
|
96 |
+
- Generating deceptive or misleading content
|
97 |
+
- Evading legitimate content attribution systems
|
98 |
+
- Any malicious purposes that could harm individuals or society
|
99 |
+
|
100 |
+
## Bias, Risks, and Limitations
|
101 |
+
|
102 |
+
- The model inherits biases from the base Llama-2-7B-Instruct model
|
103 |
+
- Performance varies based on text length and complexity
|
104 |
+
- Evasion capabilities may be reduced against newer watermarking methods
|
105 |
+
- May occasionally produce lower quality outputs compared to the base model
|
106 |
+
- Limited to English language texts
|
107 |
+
|
108 |
+
### Recommendations
|
109 |
+
|
110 |
+
- Use only for research and evaluation purposes
|
111 |
+
- Always maintain proper content attribution
|
112 |
+
- Monitor output quality metrics
|
113 |
+
- Consider ethical implications when studying security measures
|
114 |
+
- Use in conjunction with other evaluation methods
|
115 |
+
|
116 |
+
|
117 |
+
|
118 |
+
## Citation
|
119 |
+
|
120 |
+
**BibTeX:**
|
121 |
+
```bibtex
|
122 |
+
@article{diaa2024optimizing,
|
123 |
+
title={Optimizing adaptive attacks against content watermarks for language models},
|
124 |
+
author={Diaa, Abdulrahman and Aremu, Toluwani and Lukas, Nils},
|
125 |
+
journal={arXiv preprint arXiv:2410.02440},
|
126 |
+
year={2024}
|
127 |
+
}
|
128 |
+
```
|
129 |
+
|
130 |
+
## Model Card Contact
|
131 |
+
|
132 |
+
For questions about this model, please file an issue on the GitHub repository: https://github.com/ML-Watermarking/ada-llm-wm
|