DUTIR-Wang
/

PclGPT-CN

PyTorch

chatglm

custom_code

Model card Files Files and versions Community

DUTIR-Wang commited on Oct 9, 2024

Commit

b43472c

verified ·

1 Parent(s): 5b2df61

Update README.md

Browse files

Files changed (1) hide show

README.md +33 -22

README.md CHANGED Viewed

@@ -16,26 +16,29 @@ Our paper can be accessed here. Paper link: [https://arxiv.org/abs/2410.00361](h
 Our github [https://github.com/dut-laowang/emnlp24-PclGPT](https://github.com/dut-laowang/emnlp24-PclGPT)
 # Inference
-After downloading the weights, use the following code for single-sample inference with PclGPT-EN.
 ```python
-from transformers import LlamaTokenizer, LlamaForCausalLM
-# LLaMA and Tokenizer
-tokenizer = LlamaTokenizer.from_pretrained("DUTIR-Wang/PclGPT-EN")
-model = LlamaForCausalLM.from_pretrained("DUTIR-Wang/PclGPT-EN").half().cuda()
 def generate_response():
-    # Sample
-    sample_text = "For someone who's just a mere street sweeper, you sure think highly of yourself."
     instruction = (
-        "Suppose you are a linguist and you are asked to judge whether a text is patronizing and condescending. "
-        "Patronizing and condescending language expresses a sense of superiority or belittles others, making them feel inferior or incapable. "
-        "# Your return: Based on the following conversation, make a decision and return your choice. "
-        "FALSE means the text is not a condescending language, and TRUE means the text is a condescending language. "
-        "-> here is the text: （{}）"
     ).format(sample_text)
     inputs = tokenizer(instruction, return_tensors="pt").to("cuda")
     outputs = model.generate(**inputs, max_length=1024)
     output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
@@ -43,25 +46,33 @@ def generate_response():
     return classify_output(model_output)
 def extract_option(output_text):
-    options = ['TRUE', 'FALSE']
-    for option in options:
-        if option in output_text:
-            return option
-    return "Could not detect."
 def classify_output(model_output):
-    # return
-    return "The text is {} a condescending language.".format(
-        "not" if model_output == "FALSE" else ""
-    )
 response = generate_response()
 print(response)
 ```
 The output will be
 ```
-"The text is a condescending language."
 ```
 # Cite
 ```bibtex
 @misc{wang2024pclgptlargelanguagemodel,

 Our github [https://github.com/dut-laowang/emnlp24-PclGPT](https://github.com/dut-laowang/emnlp24-PclGPT)
 # Inference
+After downloading the weights, use the following code for single-sample inference with PclGPT-CN.
 ```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("DUTIR-Wang/PclGPT-CN", trust_remote_code=True)
+model = AutoModel.from_pretrained("DUTIR-Wang/PclGPT-CN", trust_remote_code=True).half().cuda()
 def generate_response():
+    # 样例文本
+    sample_text = "区区扫地的，你可真是厉害坏了"
     instruction = (
+        "假定你是一名语言学家，检测居高临下言论。居高临下言论是优势地位群体针对弱势群体的优越言论，"
+        "语言攻击性较弱，但往往夹杂着讽刺，刻板印象，会对弱势群体造成伤害。"
+        "居高临下根据语气程度分为 A级：非居高临下（正常非歧视、非冷暴力言论），"
+        "B级：轻微居高临下（语气略带鼓励或怜悯），C级：中等居高临下（说话人较为客观陈述，但语气带有歧视），"
+        "D级：严重居高临下（说话人语气轻蔑，严重歧视弱势群体）。"
+        "接下来将给你一段文本，根据上述规则，你负责判断该文本属于（A/B/C/D级）的哪一级，并只回答选项。"
+        "-> 文本：({})"
     ).format(sample_text)
+    # Tokenizer 和模型推理
     inputs = tokenizer(instruction, return_tensors="pt").to("cuda")
     outputs = model.generate(**inputs, max_length=1024)
     output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
     return classify_output(model_output)
 def extract_option(output_text):
+    options = ['A', 'B', 'C', 'D']
+    for char in reversed(output_text.strip()):
+        if char in options:
+            return char
+    return "无法识别的输出"
 def classify_output(model_output):
+    # 根据模型输出的选项返回相应的解释
+    if model_output == "A":
+        return "判断为A级：非居高临下"
+    elif model_output == "B":
+        return "判断为B级：轻微居高临下"
+    elif model_output == "C":
+        return "判断为C级：中等居高临下"
+    elif model_output == "D":
+        return "判断为D级：严重居高临下"
+    else:
+        return "无法识别的输出，请检查输入或模型输出"
 response = generate_response()
 print(response)
 ```
 The output will be
 ```
+"判断为D级：严重居高临下"
 ```
 # Cite
 ```bibtex
 @misc{wang2024pclgptlargelanguagemodel,