metadata

license: mit
language:
  - en
  - hi
  - bn
  - ta
  - te
  - ur
  - gu
  - kn
  - ml
  - pa
  - or
  - as
  - mr
tags:
  - qwen2
  - indian-languages
  - conversational-ai
  - localized-ai
  - indic-nlp
  - multilingual
  - hindi
  - bengali
  - tamil
  - telugu
  - urdu
  - gujarati
  - kannada
  - malayalam
  - punjabi
  - odia
  - assamese
  - marathi
base_model: Qwen/Qwen2.5-0.5B
pipeline_tag: text-generation
library_name: transformers
datasets:
  - ai4bharat/indic-corpus
  - indicnlp/hindi-corpus
  - custom-indian-datasets
metrics:
  - perplexity
  - bleu
  - rouge
model-index:
  - name: anki-qwen-2.5
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: indian-benchmark
          name: Indian Language Evaluation
        metrics:
          - type: perplexity
            value: 12.5
            name: Perplexity

🇮🇳 Anki Qwen 2.5 - Indian Market-Centric LLM

🚀 Model Overview

Anki Qwen 2.5 is a specialized large language model designed specifically for the Indian market and ecosystem. Built upon the robust Qwen 2.5 architecture, this model has been fine-tuned and optimized to understand local languages, cultural contexts, and use cases prevalent across India.

This model bridges the gap between global AI capabilities and local Indian needs, offering enhanced performance in:

Indic Language Understanding: Deep comprehension of Hindi, Bengali, Tamil, Telugu, Urdu, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and Marathi
Cultural Context Awareness: Understanding of Indian customs, festivals, traditions, and social dynamics
Market-Specific Applications: Tailored for Indian business scenarios, educational contexts, and daily life interactions

✨ Key Features

🌐 Indic Language Excellence

Multi-script Support: Handles Devanagari, Bengali, Tamil, Telugu, Urdu, Gujarati, and other Indian scripts
Code-mixing Capability: Seamlessly processes Hinglish and other Indian English variants
Regional Dialects: Understanding of regional variations and colloquialisms

💬 Advanced Conversational Ability

Contextual Conversations: Maintains context across long dialogues in multiple languages
Cultural Sensitivity: Responds appropriately to Indian cultural references and contexts
Formal & Informal Registers: Adapts tone based on conversation requirements

🎯 Market Specificity

Indian Business Context: Understanding of Indian market dynamics, regulations, and practices
Educational Alignment: Aligned with Indian educational curricula and learning patterns
Rural-Urban Bridge: Capable of addressing both urban and rural use cases effectively

🔧 Technical Details

Architecture

Base Model: Qwen 2.5 (0.5B parameters)
Fine-tuning: Specialized training on Indian datasets
Model Size: 494M parameters
Precision: F32 tensor type
Context Length: Up to 8K tokens

Training Data

Indic Corpus: Comprehensive collection from AI4Bharat
Hindi Literature: Classical and contemporary Hindi texts
Multilingual Datasets: Balanced representation across 12+ Indian languages
Domain-Specific Data: Business, education, healthcare, and government domains
Cultural Content: Festivals, traditions, mythology, and historical references

Licensing

Weights: Open weights under MIT License
Commercial Use: Permitted with attribution
Research Use: Fully open for academic and research purposes

🎯 Use Cases

🎬 Hindi/Indian Language Content Creation

# Generate Hindi poetry or stories
response = model.generate(
    "हिंदी में एक सुंदर कविता लिखें होली के बारे में",
    max_length=200
)

📊 Market Analysis & Business Intelligence

Indian market trend analysis
Customer sentiment analysis in local languages
Regional business strategy recommendations
Compliance and regulatory guidance

🌾 Rural Technology Enablement

Agricultural advisory in local languages
Government scheme explanations
Digital literacy support
Local language interfaces for apps

🎓 Educational Support

Multilingual tutoring assistance
Curriculum-aligned content generation
Language learning support
Cultural education resources

💼 Enterprise Applications

Customer support in regional languages
Document translation and summarization
Indian law and regulation interpretation
HR and recruitment assistance

🛠️ How to Use

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model and tokenizer
model_name = "anktechsol/anki-qwen-2.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="auto"
)

# Generate text in Hindi
prompt = "भारत में AI का भविष्य"
inputs = tokenizer.encode(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage

# Multi-language conversation
conversation = [
    {"role": "user", "content": "मुझे अपने बिजनेस के लिए एक मार्केटिंग स्ट्रैटेजी चाहिए।"},
]

# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True
)

# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Integration with Popular Frameworks

# Using with LangChain for Indian applications
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline

# Create pipeline
pipe = pipeline(
    "text-generation",
    model="anktechsol/anki-qwen-2.5",
    tokenizer="anktechsol/anki-qwen-2.5",
    max_length=512
)

# Wrap with LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# Use in your Indian language applications
response = llm("Explain GST rules in Hindi")

🤝 Community & Contributions

📢 Call to Action

We invite the Indian AI community to:

🔬 Experiment: Try the model with your specific use cases and share results
📝 Feedback: Report performance insights, especially for regional languages
🌍 Language Expansion: Help us improve coverage for underrepresented Indian languages
🤝 Collaborate: Contribute training data, evaluation benchmarks, or model improvements
📚 Research: Use this model as a foundation for Indian language research

💬 Community Channels

Discussions: Use the Community tab above for questions and suggestions
Issues: Report bugs or request features in our repository
Research: Cite this model in your academic work and share findings

🎯 Specific Areas Seeking Community Input

Regional Dialects: Help improve understanding of local variations
Domain Expertise: Contribute specialized knowledge (legal, medical, technical)
Evaluation Metrics: Develop Indian language-specific benchmarks
Cultural Nuances: Enhance cultural context understanding

🙏 Acknowledgments

📊 Datasets & Resources

AI4Bharat: For the comprehensive Indic language corpus
IndicNLP: For Hindi language resources and benchmarks
CDAC: For language technology tools and resources
IIT Madras: For Tamil language processing contributions
ISI Kolkata: For Bengali language datasets

🤝 Contributors & Community

Anktechsol Team: Core development and fine-tuning
Indian AI Research Community: Feedback and validation
Open Source Contributors: Bug fixes and improvements
Beta Testers: Early adopters who provided crucial feedback

🏢 Institutional Support

Qwen Team: For the excellent base model architecture
Hugging Face: For model hosting and distribution platform
Indian Language Technology Consortium: For linguistic resources

📖 Citation

If you use this model in your research or applications, please cite:

@misc{anki-qwen-2.5,
  title={Anki Qwen 2.5: An Indian Market-Centric Large Language Model},
  author={Anktechsol},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/anktechsol/anki-qwen-2.5}},
}

🚀 Ready to explore AI in Indian languages? Start using Anki Qwen 2.5 today!
Made with ❤️ for the Indian AI community

📋 Model Information

Attribute	Value
Model Size	494M parameters
Base Model	Qwen 2.5
Languages	12+ Indian languages + English
License	MIT
Context Length	8K tokens
Precision	F32
Training Data	Indian-centric multilingual corpus
Use Cases	Conversational AI, Content Generation, Market Analysis

For technical support, feature requests, or collaborations, please reach out through the Community discussions or contact anktechsol directly.