Why does the chat prompt template adding dates to the messages?
I tried to do
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
chat = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "I'd like to show off how chat templating works!"},
]
print(tokenizer.apply_chat_template(chat,tokenize=False))
Output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
Hello, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
I'm doing great. How can I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|>
I'd like to show off how chat templating works!<|eot_id|>
Why am I getting dates during the tokenization process? This may impact the performance during fine-tuning the model for custom task. Also the date 26 Jul is not updated too.
You can change today's date by setting the date_string variable in apply_chat_template:
from transformers import AutoTokenizer
model_path = "C:/Models/meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
messages = [
{ "role": "system", "content": "You are a useful and respectable chatbot." },
{ "role": "user", "content": "What day is it today?" },
{ "role": "assistant", "content": "It's Christmas!" }
]
text = tokenizer.apply_chat_template(messages, tokenize = False, date_string = "25 Dec 2025")
print(text)
Output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 25 Dec 2025
You are a useful and respectable chatbot.<|eot_id|><|start_header_id|>user<|end_header_id|>
What day is it today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
It's Christmas!<|eot_id|>
For Knowledge Cutoff Date, you can change the tokenizer_config.json file by changing the chat_template.
Since the date_string variable is just a string, you can format these dates like this:
from datetime import datetime
# Same code as above
today_date = datetime.now().strftime("%d %b %Y")
training_data = today_date + "\n" + "Fine-Tunning Date: 25 Dec 2024"
text = tokenizer.apply_chat_template(messages, tokenize = False, date_string = training_data)
print(text)
Output:
Cutting Knowledge Date: December 2023
Today Date: 14 Mar 2025
Fine-Tunning Date: 25 Dec 2024
Is it OK if I remove date system prompt?
I don't know, it seems that the chat template always adds the cutoff date and the current day, even if you don't provide a system prompt.
You can try editing the tokenizer_config.json file to change the chat_template, and see if it works or if you get some error or hallucination when generating the LLM responses.