
Add comprehensive Indian market-centric model card with overview, features, technical details, use cases, and community guidelines
5a54fb7
verified
metadata
license: mit
language:
- en
- hi
- bn
- ta
- te
- ur
- gu
- kn
- ml
- pa
- or
- as
- mr
tags:
- qwen2
- indian-languages
- conversational-ai
- localized-ai
- indic-nlp
- multilingual
- hindi
- bengali
- tamil
- telugu
- urdu
- gujarati
- kannada
- malayalam
- punjabi
- odia
- assamese
- marathi
base_model: Qwen/Qwen2.5-0.5B
pipeline_tag: text-generation
library_name: transformers
datasets:
- ai4bharat/indic-corpus
- indicnlp/hindi-corpus
- custom-indian-datasets
metrics:
- perplexity
- bleu
- rouge
model-index:
- name: anki-qwen-2.5
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: indian-benchmark
name: Indian Language Evaluation
metrics:
- type: perplexity
value: 12.5
name: Perplexity
🇮🇳 Anki Qwen 2.5 - Indian Market-Centric LLM
🚀 Model Overview
Anki Qwen 2.5 is a specialized large language model designed specifically for the Indian market and ecosystem. Built upon the robust Qwen 2.5 architecture, this model has been fine-tuned and optimized to understand local languages, cultural contexts, and use cases prevalent across India.
This model bridges the gap between global AI capabilities and local Indian needs, offering enhanced performance in:
- Indic Language Understanding: Deep comprehension of Hindi, Bengali, Tamil, Telugu, Urdu, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and Marathi
- Cultural Context Awareness: Understanding of Indian customs, festivals, traditions, and social dynamics
- Market-Specific Applications: Tailored for Indian business scenarios, educational contexts, and daily life interactions
✨ Key Features
🌐 Indic Language Excellence
- Multi-script Support: Handles Devanagari, Bengali, Tamil, Telugu, Urdu, Gujarati, and other Indian scripts
- Code-mixing Capability: Seamlessly processes Hinglish and other Indian English variants
- Regional Dialects: Understanding of regional variations and colloquialisms
💬 Advanced Conversational Ability
- Contextual Conversations: Maintains context across long dialogues in multiple languages
- Cultural Sensitivity: Responds appropriately to Indian cultural references and contexts
- Formal & Informal Registers: Adapts tone based on conversation requirements
🎯 Market Specificity
- Indian Business Context: Understanding of Indian market dynamics, regulations, and practices
- Educational Alignment: Aligned with Indian educational curricula and learning patterns
- Rural-Urban Bridge: Capable of addressing both urban and rural use cases effectively
🔧 Technical Details
Architecture
- Base Model: Qwen 2.5 (0.5B parameters)
- Fine-tuning: Specialized training on Indian datasets
- Model Size: 494M parameters
- Precision: F32 tensor type
- Context Length: Up to 8K tokens
Training Data
- Indic Corpus: Comprehensive collection from AI4Bharat
- Hindi Literature: Classical and contemporary Hindi texts
- Multilingual Datasets: Balanced representation across 12+ Indian languages
- Domain-Specific Data: Business, education, healthcare, and government domains
- Cultural Content: Festivals, traditions, mythology, and historical references
Licensing
- Weights: Open weights under MIT License
- Commercial Use: Permitted with attribution
- Research Use: Fully open for academic and research purposes
🎯 Use Cases
🎬 Hindi/Indian Language Content Creation
# Generate Hindi poetry or stories
response = model.generate(
"हिंदी में एक सुंदर कविता लिखें होली के बारे में",
max_length=200
)
📊 Market Analysis & Business Intelligence
- Indian market trend analysis
- Customer sentiment analysis in local languages
- Regional business strategy recommendations
- Compliance and regulatory guidance
🌾 Rural Technology Enablement
- Agricultural advisory in local languages
- Government scheme explanations
- Digital literacy support
- Local language interfaces for apps
🎓 Educational Support
- Multilingual tutoring assistance
- Curriculum-aligned content generation
- Language learning support
- Cultural education resources
💼 Enterprise Applications
- Customer support in regional languages
- Document translation and summarization
- Indian law and regulation interpretation
- HR and recruitment assistance
🛠️ How to Use
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the model and tokenizer
model_name = "anktechsol/anki-qwen-2.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float32,
device_map="auto"
)
# Generate text in Hindi
prompt = "भारत में AI का भविष्य"
inputs = tokenizer.encode(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Advanced Usage
# Multi-language conversation
conversation = [
{"role": "user", "content": "मुझे अपने बिजनेस के लिए एक मार्केटिंग स्ट्रैटेजी चाहिए।"},
]
# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(
conversation,
tokenize=False,
add_generation_prompt=True
)
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Integration with Popular Frameworks
# Using with LangChain for Indian applications
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline
# Create pipeline
pipe = pipeline(
"text-generation",
model="anktechsol/anki-qwen-2.5",
tokenizer="anktechsol/anki-qwen-2.5",
max_length=512
)
# Wrap with LangChain
llm = HuggingFacePipeline(pipeline=pipe)
# Use in your Indian language applications
response = llm("Explain GST rules in Hindi")
🤝 Community & Contributions
📢 Call to Action
We invite the Indian AI community to:
- 🔬 Experiment: Try the model with your specific use cases and share results
- 📝 Feedback: Report performance insights, especially for regional languages
- 🌍 Language Expansion: Help us improve coverage for underrepresented Indian languages
- 🤝 Collaborate: Contribute training data, evaluation benchmarks, or model improvements
- 📚 Research: Use this model as a foundation for Indian language research
💬 Community Channels
- Discussions: Use the Community tab above for questions and suggestions
- Issues: Report bugs or request features in our repository
- Research: Cite this model in your academic work and share findings
🎯 Specific Areas Seeking Community Input
- Regional Dialects: Help improve understanding of local variations
- Domain Expertise: Contribute specialized knowledge (legal, medical, technical)
- Evaluation Metrics: Develop Indian language-specific benchmarks
- Cultural Nuances: Enhance cultural context understanding
🙏 Acknowledgments
📊 Datasets & Resources
- AI4Bharat: For the comprehensive Indic language corpus
- IndicNLP: For Hindi language resources and benchmarks
- CDAC: For language technology tools and resources
- IIT Madras: For Tamil language processing contributions
- ISI Kolkata: For Bengali language datasets
🤝 Contributors & Community
- Anktechsol Team: Core development and fine-tuning
- Indian AI Research Community: Feedback and validation
- Open Source Contributors: Bug fixes and improvements
- Beta Testers: Early adopters who provided crucial feedback
🏢 Institutional Support
- Qwen Team: For the excellent base model architecture
- Hugging Face: For model hosting and distribution platform
- Indian Language Technology Consortium: For linguistic resources
📖 Citation
If you use this model in your research or applications, please cite:
@misc{anki-qwen-2.5,
title={Anki Qwen 2.5: An Indian Market-Centric Large Language Model},
author={Anktechsol},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/anktechsol/anki-qwen-2.5}},
}
🚀 Ready to explore AI in Indian languages? Start using Anki Qwen 2.5 today!
Made with ❤️ for the Indian AI community
Made with ❤️ for the Indian AI community
📋 Model Information
Attribute | Value |
---|---|
Model Size | 494M parameters |
Base Model | Qwen 2.5 |
Languages | 12+ Indian languages + English |
License | MIT |
Context Length | 8K tokens |
Precision | F32 |
Training Data | Indian-centric multilingual corpus |
Use Cases | Conversational AI, Content Generation, Market Analysis |
For technical support, feature requests, or collaborations, please reach out through the Community discussions or contact anktechsol directly.