How to Get Started with the Model
π How to Use This Model for Inference
This model is fine-tuned using LoRA (PEFT) on Phi-4 (4-bit Unsloth). To use it, you need to:
- Load the base model
- Load the LoRA adapter
- Run inference
π Install Required Libraries
Before running the code, make sure you have the necessary dependencies installed:
pip install unsloth peft transformers torch
π Load and Run Inference
from unsloth import FastLanguageModel
from peft import PeftModel
import torch
# Load the base model
base_model_name = "unsloth/Phi-4-unsloth-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=base_model_name,
max_seq_length=4096, # Must match fine-tuning
load_in_4bit=True,
)
# Load the fine-tuned LoRA adapter
lora_model_name = "Machlovi/Phi_Fullshot"
model = PeftModel.from_pretrained(model, lora_model_name)
# Run inference
input_text = "Why do we need to go to see something?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=4)
# Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
π‘ Notes
- This model is quantized in 4-bit for efficiency.
- Ensure
max_seq_length
matches the training configuration. - This model requires a GPU (CUDA) for inference.
[More Information Needed]
Uploaded model
- Developed by: Machlovi
- License: apache-2.0
- Finetuned from model : unsloth/Phi-4-unsloth-bnb-4bit
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.