DDiaa commited on
Commit
0cfe9b6
·
verified ·
1 Parent(s): d4abfe3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
3
+ library_name: peft
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ ---
9
+
10
+ # Adaptively-tuned Qwen2.5-1.5B Paraphraser
11
+
12
+ This model is an adaptively fine-tuned version of Qwen2.5-1.5B-Instruct optimized to evade the EXP watermarking method while preserving text quality. It serves as a paraphrasing model that maintains semantic meaning while modifying the statistical patterns used for watermark detection.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+
18
+ This model is a fine-tuned version of Qwen2.5-1.5B-Instruct that has been optimized using Direct Preference Optimization (DPO) to evade the EXP watermarking method described in Aaronson and Kirchner (2023). The model preserves text quality while modifying the statistical patterns that watermarking methods rely on for detection.
19
+
20
+ - **Model type:** Decoder-only transformer language model
21
+ - **Language(s):** English
22
+ - **Finetuned from model:** Qwen2.5-1.5B-Instruct
23
+
24
+ ## Get Started
25
+
26
+ ```python
27
+ from transformers import AutoModelForCausalLM, AutoTokenizer
28
+ from peft import PeftModel, PeftConfig
29
+
30
+ # Load the base model
31
+ model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
32
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
33
+
34
+ # Load the LoRA adapter
35
+ model = PeftModel.from_pretrained(model, "DDiaa/EXP-Qwen2.5-1.5B")
36
+
37
+ # Prepare the prompt
38
+
39
+ system_prompt = (
40
+ "You are an expert copy-editor. Please rewrite the following text in your own voice and paraphrase all "
41
+ "sentences.\n Ensure that the final output contains the same information as the original text and has "
42
+ "roughly the same length.\n Do not leave out any important details when rewriting in your own voice. Do "
43
+ "not include any information that is not present in the original text. Do not respond with a greeting or "
44
+ "any other extraneous information. Skip the preamble. Just rewrite the text directly."
45
+ )
46
+
47
+ def paraphrase_text(text):
48
+ # Prepare prompt
49
+ prompt = tokenizer.apply_chat_template(
50
+ [
51
+ {"role": "system", "content": system_prompt},
52
+ {"role": "user", "content": f"\n[[START OF TEXT]]\n{text}\n[[END OF TEXT]]"},
53
+ ],
54
+ tokenize=False,
55
+ add_generation_prompt=True,
56
+ ) + "[[START OF PARAPHRASE]]\n"
57
+
58
+ # Generate paraphrase
59
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
60
+ outputs = model.generate(
61
+ **inputs,
62
+ max_new_tokens=512,
63
+ temperature=1.0,
64
+ do_sample=True,
65
+ pad_token_id=tokenizer.pad_token_id
66
+ )
67
+
68
+ # Post-process output
69
+ paraphrased = tokenizer.decode(outputs[0], skip_special_tokens=True)
70
+ paraphrased = paraphrased.split("[[START OF PARAPHRASE]]")[1].split("[[END OF")[0].strip()
71
+
72
+ return paraphrased
73
+ ```
74
+
75
+
76
+ ## Uses
77
+
78
+ ### Direct Use
79
+
80
+ The model is designed for research purposes to:
81
+ 1. Study the robustness of watermarking methods
82
+ 2. Evaluate the effectiveness of adaptive attacks against content watermarks
83
+ 3. Test and develop improved watermarking techniques
84
+
85
+ ### Downstream Use
86
+
87
+ The model can be integrated into:
88
+ - Watermark robustness evaluation pipelines
89
+ - Research frameworks studying language model security
90
+ - Benchmark suites for watermarking methods
91
+
92
+ ### Out-of-Scope Use
93
+
94
+ This model should not be used for:
95
+ - Production environments requiring watermark compliance
96
+ - Generating deceptive or misleading content
97
+ - Evading legitimate content attribution systems
98
+ - Any malicious purposes that could harm individuals or society
99
+
100
+ ## Bias, Risks, and Limitations
101
+
102
+ - The model inherits biases from the base Qwen2.5-1.5B-Instruct model
103
+ - Performance varies based on text length and complexity
104
+ - Evasion capabilities may be reduced against newer watermarking methods
105
+ - May occasionally produce lower quality outputs compared to the base model
106
+ - Limited to English language texts
107
+
108
+ ### Recommendations
109
+
110
+ - Use only for research and evaluation purposes
111
+ - Always maintain proper content attribution
112
+ - Monitor output quality metrics
113
+ - Consider ethical implications when studying security measures
114
+ - Use in conjunction with other evaluation methods
115
+
116
+
117
+
118
+ ## Citation
119
+
120
+ **BibTeX:**
121
+ ```bibtex
122
+ @article{diaa2024optimizing,
123
+ title={Optimizing adaptive attacks against content watermarks for language models},
124
+ author={Diaa, Abdulrahman and Aremu, Toluwani and Lukas, Nils},
125
+ journal={arXiv preprint arXiv:2410.02440},
126
+ year={2024}
127
+ }
128
+ ```
129
+
130
+ ## Model Card Contact
131
+
132
+ For questions about this model, please file an issue on the GitHub repository: https://github.com/ML-Watermarking/ada-llm-wm