DUTIR-Wang commited on
Commit
b43472c
·
verified ·
1 Parent(s): 5b2df61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -22
README.md CHANGED
@@ -16,26 +16,29 @@ Our paper can be accessed here. Paper link: [https://arxiv.org/abs/2410.00361](h
16
  Our github [https://github.com/dut-laowang/emnlp24-PclGPT](https://github.com/dut-laowang/emnlp24-PclGPT)
17
 
18
  # Inference
19
- After downloading the weights, use the following code for single-sample inference with PclGPT-EN.
 
20
  ```python
21
- from transformers import LlamaTokenizer, LlamaForCausalLM
22
 
23
- # LLaMA and Tokenizer
24
- tokenizer = LlamaTokenizer.from_pretrained("DUTIR-Wang/PclGPT-EN")
25
- model = LlamaForCausalLM.from_pretrained("DUTIR-Wang/PclGPT-EN").half().cuda()
26
 
27
  def generate_response():
28
- # Sample
29
- sample_text = "For someone who's just a mere street sweeper, you sure think highly of yourself."
30
 
31
  instruction = (
32
- "Suppose you are a linguist and you are asked to judge whether a text is patronizing and condescending. "
33
- "Patronizing and condescending language expresses a sense of superiority or belittles others, making them feel inferior or incapable. "
34
- "# Your return: Based on the following conversation, make a decision and return your choice. "
35
- "FALSE means the text is not a condescending language, and TRUE means the text is a condescending language. "
36
- "-> here is the text: ({})"
 
 
37
  ).format(sample_text)
38
 
 
39
  inputs = tokenizer(instruction, return_tensors="pt").to("cuda")
40
  outputs = model.generate(**inputs, max_length=1024)
41
  output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
@@ -43,25 +46,33 @@ def generate_response():
43
  return classify_output(model_output)
44
 
45
  def extract_option(output_text):
46
- options = ['TRUE', 'FALSE']
47
- for option in options:
48
- if option in output_text:
49
- return option
50
- return "Could not detect."
51
 
52
  def classify_output(model_output):
53
- # return
54
- return "The text is {} a condescending language.".format(
55
- "not" if model_output == "FALSE" else ""
56
- )
 
 
 
 
 
 
 
57
 
58
  response = generate_response()
59
  print(response)
60
  ```
61
  The output will be
62
  ```
63
- "The text is a condescending language."
64
  ```
 
65
  # Cite
66
  ```bibtex
67
  @misc{wang2024pclgptlargelanguagemodel,
 
16
  Our github [https://github.com/dut-laowang/emnlp24-PclGPT](https://github.com/dut-laowang/emnlp24-PclGPT)
17
 
18
  # Inference
19
+ After downloading the weights, use the following code for single-sample inference with PclGPT-CN.
20
+
21
  ```python
22
+ from transformers import AutoTokenizer, AutoModel
23
 
24
+ tokenizer = AutoTokenizer.from_pretrained("DUTIR-Wang/PclGPT-CN", trust_remote_code=True)
25
+ model = AutoModel.from_pretrained("DUTIR-Wang/PclGPT-CN", trust_remote_code=True).half().cuda()
 
26
 
27
  def generate_response():
28
+ # 样例文本
29
+ sample_text = "区区扫地的,你可真是厉害坏了"
30
 
31
  instruction = (
32
+ "假定你是一名语言学家,检测居高临下言论。居高临下言论是优势地位群体针对弱势群体的优越言论,"
33
+ "语言攻击性较弱,但往往夹杂着讽刺,刻板印象,会对弱势群体造成伤害。"
34
+ "居高临下根据语气程度分为 A级:非居高临下(正常非歧视、非冷暴力言论),"
35
+ "B级:轻微居高临下(语气略带鼓励或怜悯),C级:中等居高临下(说话人较为客观陈述,但语气带有歧视),"
36
+ "D级:严重居高临下(说话人语气轻蔑,严重歧视弱势群体)。"
37
+ "接下来将给你一段文本,根据上述规则,你负责判断该文本属于(A/B/C/D级)的哪一级,并只回答选项。"
38
+ "-> 文本:({})"
39
  ).format(sample_text)
40
 
41
+ # Tokenizer 和模型推理
42
  inputs = tokenizer(instruction, return_tensors="pt").to("cuda")
43
  outputs = model.generate(**inputs, max_length=1024)
44
  output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
 
46
  return classify_output(model_output)
47
 
48
  def extract_option(output_text):
49
+ options = ['A', 'B', 'C', 'D']
50
+ for char in reversed(output_text.strip()):
51
+ if char in options:
52
+ return char
53
+ return "无法识别的输出"
54
 
55
  def classify_output(model_output):
56
+ # 根据模型输出的选项返回相应的解释
57
+ if model_output == "A":
58
+ return "判断为A级:非居高临下"
59
+ elif model_output == "B":
60
+ return "判断为B级:轻微居高临下"
61
+ elif model_output == "C":
62
+ return "判断为C级:中等居高临下"
63
+ elif model_output == "D":
64
+ return "判断为D级:严重居高临下"
65
+ else:
66
+ return "无法识别的输出,请检查输入或模型输出"
67
 
68
  response = generate_response()
69
  print(response)
70
  ```
71
  The output will be
72
  ```
73
+ "判断为D级:严重居高临下"
74
  ```
75
+
76
  # Cite
77
  ```bibtex
78
  @misc{wang2024pclgptlargelanguagemodel,