YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Training details:
LoRA fine-tuning:
- lora_r = 16
- lora_alpha = 64
- lora_dropout = 0.2
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"]
Training args:
- num_train_epochs = 1.25
- per_device_train_batch_size = 1
- per_device_eval_batch_size = 1
- gradient_accumulation_steps = 4
- learning_rate = 2e-5
- weight_decay = 0.001
- optim = "paged_adamw_8bit"
- lr_scheduler_type = "constant"
- warmup_ratio = 0.03
- max_seq_length = 1024
- neftune_noise_alpha = 5
Extra added tokens:
- Task:
- Output Schema:
- <|system|>
- <|user|>
- |assistant|>
Usage:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")
model = AutoModelForCausalLM.from_pretrained(
"AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map={"": 0},
)
eval_prompt = """<|system|>
Task: Given the a list of previous user queries, predict 3 future queries.
Output Schema:
{'properties': {'predicted_queries': {'description': 'The list of predicted queries', 'items': {'type': 'string'}, 'title': 'Predicted Queries', 'type': 'array'}}, 'required': ['predicted_queries'], 'title': 'ResponseModel', 'type': 'object'}<|end▁of▁sentence|><|user|>
# Previous queries:
---
- Your core strength lies in understanding user intent and delivering clear, truthful, and empathetic responses.
- Utilize the provided reference information to enhance your responses and ensure accuracy.
- Always cross-reference the information for reliability, as references may vary in accuracy.
---<|end▁of▁sentence|><|assistant|>"""
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
merged_model.eval()
with torch.no_grad():
stop_token_id = tokenizer.convert_tokens_to_ids("<|end▁of▁sentence|>")
gen_config = merged_model.generation_config
gen_config.temperature = 0.1
gen_config.max_length = 500
gen_config.stop_token_id = stop_token_id
output = merged_model.generate(**model_input, generation_config=gen_config)
decoded_output = [tokenizer.decode(token_id) for token_id in output]
print(decoded_output[0])
# Output:
# {"predicted_queries": ["How can I effectively communicate my core strength to users?", "What are some effective strategies for delivering accurate and truthful responses to users?", "How can I ensure accuracy and reliability of my responses to users?"]}
- Downloads last month
- 5