Why does the chat prompt template adding dates to the messages?

#179
by Krooz - opened

I tried to do

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

chat = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

print(tokenizer.apply_chat_template(chat,tokenize=False))

Output:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm doing great. How can I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'd like to show off how chat templating works!<|eot_id|>

Why am I getting dates during the tokenization process? This may impact the performance during fine-tuning the model for custom task. Also the date 26 Jul is not updated too.

You can change today's date by setting the date_string variable in apply_chat_template:

from transformers import AutoTokenizer


model_path = "C:/Models/meta-llama/Llama-3.2-1B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_path)

messages = [
    { "role": "system", "content": "You are a useful and respectable chatbot." },
    { "role": "user", "content": "What day is it today?" },
    { "role": "assistant", "content": "It's Christmas!" }
]

text = tokenizer.apply_chat_template(messages, tokenize = False, date_string = "25 Dec 2025")

print(text)

Output:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 25 Dec 2025

You are a useful and respectable chatbot.<|eot_id|><|start_header_id|>user<|end_header_id|>

What day is it today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

It's Christmas!<|eot_id|>

For Knowledge Cutoff Date, you can change the tokenizer_config.json file by changing the chat_template.

Since the date_string variable is just a string, you can format these dates like this:

from datetime import datetime

# Same code as above

today_date = datetime.now().strftime("%d %b %Y")

training_data = today_date + "\n" + "Fine-Tunning Date: 25 Dec 2024"

text = tokenizer.apply_chat_template(messages, tokenize = False, date_string = training_data)

print(text)

Output:

Cutting Knowledge Date: December 2023
Today Date: 14 Mar 2025
Fine-Tunning Date: 25 Dec 2024

Is it OK if I remove date system prompt?

I don't know, it seems that the chat template always adds the cutoff date and the current day, even if you don't provide a system prompt.

You can try editing the tokenizer_config.json file to change the chat_template, and see if it works or if you get some error or hallucination when generating the LLM responses.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment