WIDDX-AI-v1
WIDDX-AI-v1 is a fine-tuned conversational AI assistant based on Qwen2.5-7B-Instruct, optimized for instruction following and general conversation using LoRA (Low-Rank Adaptation) technique.
Model Details
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Languages: English, Arabic
- Model Type: Causal Language Model
- License: Apache 2.0
Capabilities
This model excels at:
- General Conversation: Natural, engaging dialogue
- Instruction Following: Accurate execution of user instructions
- Code Generation: Writing and explaining code in multiple programming languages
- Question Answering: Providing informative responses to various topics
- Text Analysis: Understanding and processing textual content
- Creative Writing: Generating creative content and stories
Usage
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("WIDDX-AI/WIDDX-AI-v1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("WIDDX-AI/WIDDX-AI-v1", trust_remote_code=True)
# Prepare input
messages = [
{"role": "user", "content": "Explain machine learning in simple terms"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt")
# Generate response
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
Memory-Efficient Loading
For systems with limited GPU memory, use 4-bit quantization:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Configure quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
# Load with quantization
model = AutoModelForCausalLM.from_pretrained(
"WIDDX-AI/WIDDX-AI-v1",
quantization_config=quantization_config,
device_map="auto",
trust_remote_code=True
)
CPU Inference
For CPU-only inference:
model = AutoModelForCausalLM.from_pretrained(
"WIDDX-AI/WIDDX-AI-v1",
torch_dtype=torch.float32,
trust_remote_code=True
)
Training Details
Dataset
- Custom instruction-following dataset with high-quality examples
- Diverse topics covering conversation, coding, analysis, and creativity
- Carefully curated and filtered for quality and safety
Training Parameters
- Method: LoRA (Low-Rank Adaptation)
- Rank: 16
- Alpha: 32
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Learning Rate: 2e-4
- Batch Size: 4 (with gradient accumulation)
- Epochs: 3
Hardware
- Training conducted on NVIDIA GPUs with 4-bit quantization
- Compatible with both GPU and CPU inference
Performance
The model demonstrates strong performance across various tasks:
- High instruction following accuracy
- Natural conversational flow
- Accurate code generation and explanation
- Comprehensive knowledge across multiple domains
Limitations
- May occasionally generate incorrect factual information
- Performance depends on input quality and clarity
- Limited by the knowledge cutoff of the base model
- Requires appropriate prompt formatting for optimal results
Safety and Bias
This model inherits the safety measures and potential biases from its base model. Users should:
- Review outputs for appropriateness
- Implement additional safety measures in production
- Be aware of potential cultural and linguistic biases
Citation
If you use this model in your research or applications, please cite:
@misc{widdx-ai-v1,
title={WIDDX-AI-v1: A Fine-tuned Conversational Assistant},
author={WIDDX-AI},
year={2024},
url={https://huggingface.co/WIDDX-AI/WIDDX-AI-v1}
}
Contact
For questions, issues, or collaboration opportunities, please visit our repository or contact the WIDDX-AI team.
Built with ❤️ by the WIDDX-AI team
- Downloads last month
- -