YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Training details:

  • LoRA fine-tuning:

    • lora_r = 16
    • lora_alpha = 64
    • lora_dropout = 0.2
    • target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"]
  • Training args:

    • num_train_epochs = 1.25
    • per_device_train_batch_size = 1
    • per_device_eval_batch_size = 1
    • gradient_accumulation_steps = 4
    • learning_rate = 2e-5
    • weight_decay = 0.001
    • optim = "paged_adamw_8bit"
    • lr_scheduler_type = "constant"
    • warmup_ratio = 0.03
    • max_seq_length = 1024
    • neftune_noise_alpha = 5
  • Extra added tokens:

    • Task:
    • Output Schema:
    • <|system|>
    • <|user|>
    • |assistant|>

Usage:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged")
model = AutoModelForCausalLM.from_pretrained(
  "AswanthCManoj/azma-deepseek-1.3b-instruct-v4-merged",
  low_cpu_mem_usage=True,
  return_dict=True,
  torch_dtype=torch.float16,
  device_map={"": 0},
)

eval_prompt = """<|system|>
Task: Given the a list of previous user queries, predict 3 future queries.

Output Schema:
{'properties': {'predicted_queries': {'description': 'The list of predicted queries', 'items': {'type': 'string'}, 'title': 'Predicted Queries', 'type': 'array'}}, 'required': ['predicted_queries'], 'title': 'ResponseModel', 'type': 'object'}<|end▁of▁sentence|><|user|>
# Previous queries:
---
- Your core strength lies in understanding user intent and delivering clear, truthful, and empathetic responses.
- Utilize the provided reference information to enhance your responses and ensure accuracy.
- Always cross-reference the information for reliability, as references may vary in accuracy.
---<|end▁of▁sentence|><|assistant|>"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

merged_model.eval()
with torch.no_grad():
    stop_token_id = tokenizer.convert_tokens_to_ids("<|end▁of▁sentence|>")
    gen_config = merged_model.generation_config
    gen_config.temperature = 0.1
    gen_config.max_length = 500
    gen_config.stop_token_id = stop_token_id
    output = merged_model.generate(**model_input, generation_config=gen_config)
    decoded_output = [tokenizer.decode(token_id) for token_id in output]
    print(decoded_output[0])

# Output:
# {"predicted_queries": ["How can I effectively communicate my core strength to users?", "What are some effective strategies for delivering accurate and truthful responses to users?", "How can I ensure accuracy and reliability of my responses to users?"]}
Downloads last month
5
Safetensors
Model size
1.35B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support