Ray2333/GRM-Llama3.2-3B-rewardmodel-ft · Bug in readme implementation

Hi,

The implementation you suggest in the readme has a subtle bug, in that applying chat template with tokenization set to false and then tokenizing yields a sequence that has a double BoS token:

message = [
  {'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?"},
  {'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
kwargs = {"padding": 'longest', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs) # tokenizing after chat-templating adds a redundant bos token.

# >> yields an input_ids tensor that when decoded is:
# >> ["<|begin_of_text|><|begin_of_text|><|start_header_id|>system...

The solution for this is to avoid the extra tokenization step tokens = tokenizer.encode_plus(...), and tokenize everything under .apply_chat_template:

message_template = tokenizer.apply_chat_template(message, tokenize=True, return_dict=True, **kwargs)
# >> yields an input_ids tensor that when decoded is:
# >> ["<|begin_of_text|><|start_header_id|>system...