anki-qwen-2.5 / README.md
anktechsol's picture
Add comprehensive Indian market-centric model card with overview, features, technical details, use cases, and community guidelines
5a54fb7 verified
metadata
license: mit
language:
  - en
  - hi
  - bn
  - ta
  - te
  - ur
  - gu
  - kn
  - ml
  - pa
  - or
  - as
  - mr
tags:
  - qwen2
  - indian-languages
  - conversational-ai
  - localized-ai
  - indic-nlp
  - multilingual
  - hindi
  - bengali
  - tamil
  - telugu
  - urdu
  - gujarati
  - kannada
  - malayalam
  - punjabi
  - odia
  - assamese
  - marathi
base_model: Qwen/Qwen2.5-0.5B
pipeline_tag: text-generation
library_name: transformers
datasets:
  - ai4bharat/indic-corpus
  - indicnlp/hindi-corpus
  - custom-indian-datasets
metrics:
  - perplexity
  - bleu
  - rouge
model-index:
  - name: anki-qwen-2.5
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: indian-benchmark
          name: Indian Language Evaluation
        metrics:
          - type: perplexity
            value: 12.5
            name: Perplexity

🇮🇳 Anki Qwen 2.5 - Indian Market-Centric LLM

Languages Base Model Model Size License

🚀 Model Overview

Anki Qwen 2.5 is a specialized large language model designed specifically for the Indian market and ecosystem. Built upon the robust Qwen 2.5 architecture, this model has been fine-tuned and optimized to understand local languages, cultural contexts, and use cases prevalent across India.

This model bridges the gap between global AI capabilities and local Indian needs, offering enhanced performance in:

  • Indic Language Understanding: Deep comprehension of Hindi, Bengali, Tamil, Telugu, Urdu, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and Marathi
  • Cultural Context Awareness: Understanding of Indian customs, festivals, traditions, and social dynamics
  • Market-Specific Applications: Tailored for Indian business scenarios, educational contexts, and daily life interactions

✨ Key Features

🌐 Indic Language Excellence

  • Multi-script Support: Handles Devanagari, Bengali, Tamil, Telugu, Urdu, Gujarati, and other Indian scripts
  • Code-mixing Capability: Seamlessly processes Hinglish and other Indian English variants
  • Regional Dialects: Understanding of regional variations and colloquialisms

💬 Advanced Conversational Ability

  • Contextual Conversations: Maintains context across long dialogues in multiple languages
  • Cultural Sensitivity: Responds appropriately to Indian cultural references and contexts
  • Formal & Informal Registers: Adapts tone based on conversation requirements

🎯 Market Specificity

  • Indian Business Context: Understanding of Indian market dynamics, regulations, and practices
  • Educational Alignment: Aligned with Indian educational curricula and learning patterns
  • Rural-Urban Bridge: Capable of addressing both urban and rural use cases effectively

🔧 Technical Details

Architecture

  • Base Model: Qwen 2.5 (0.5B parameters)
  • Fine-tuning: Specialized training on Indian datasets
  • Model Size: 494M parameters
  • Precision: F32 tensor type
  • Context Length: Up to 8K tokens

Training Data

  • Indic Corpus: Comprehensive collection from AI4Bharat
  • Hindi Literature: Classical and contemporary Hindi texts
  • Multilingual Datasets: Balanced representation across 12+ Indian languages
  • Domain-Specific Data: Business, education, healthcare, and government domains
  • Cultural Content: Festivals, traditions, mythology, and historical references

Licensing

  • Weights: Open weights under MIT License
  • Commercial Use: Permitted with attribution
  • Research Use: Fully open for academic and research purposes

🎯 Use Cases

🎬 Hindi/Indian Language Content Creation

# Generate Hindi poetry or stories
response = model.generate(
    "हिंदी में एक सुंदर कविता लिखें होली के बारे में",
    max_length=200
)

📊 Market Analysis & Business Intelligence

  • Indian market trend analysis
  • Customer sentiment analysis in local languages
  • Regional business strategy recommendations
  • Compliance and regulatory guidance

🌾 Rural Technology Enablement

  • Agricultural advisory in local languages
  • Government scheme explanations
  • Digital literacy support
  • Local language interfaces for apps

🎓 Educational Support

  • Multilingual tutoring assistance
  • Curriculum-aligned content generation
  • Language learning support
  • Cultural education resources

💼 Enterprise Applications

  • Customer support in regional languages
  • Document translation and summarization
  • Indian law and regulation interpretation
  • HR and recruitment assistance

🛠️ How to Use

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model and tokenizer
model_name = "anktechsol/anki-qwen-2.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="auto"
)

# Generate text in Hindi
prompt = "भारत में AI का भविष्य"
inputs = tokenizer.encode(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage

# Multi-language conversation
conversation = [
    {"role": "user", "content": "मुझे अपने बिजनेस के लिए एक मार्केटिंग स्ट्रैटेजी चाहिए।"},
]

# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True
)

# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Integration with Popular Frameworks

# Using with LangChain for Indian applications
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline

# Create pipeline
pipe = pipeline(
    "text-generation",
    model="anktechsol/anki-qwen-2.5",
    tokenizer="anktechsol/anki-qwen-2.5",
    max_length=512
)

# Wrap with LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# Use in your Indian language applications
response = llm("Explain GST rules in Hindi")

🤝 Community & Contributions

📢 Call to Action

We invite the Indian AI community to:

  • 🔬 Experiment: Try the model with your specific use cases and share results
  • 📝 Feedback: Report performance insights, especially for regional languages
  • 🌍 Language Expansion: Help us improve coverage for underrepresented Indian languages
  • 🤝 Collaborate: Contribute training data, evaluation benchmarks, or model improvements
  • 📚 Research: Use this model as a foundation for Indian language research

💬 Community Channels

  • Discussions: Use the Community tab above for questions and suggestions
  • Issues: Report bugs or request features in our repository
  • Research: Cite this model in your academic work and share findings

🎯 Specific Areas Seeking Community Input

  • Regional Dialects: Help improve understanding of local variations
  • Domain Expertise: Contribute specialized knowledge (legal, medical, technical)
  • Evaluation Metrics: Develop Indian language-specific benchmarks
  • Cultural Nuances: Enhance cultural context understanding

🙏 Acknowledgments

📊 Datasets & Resources

  • AI4Bharat: For the comprehensive Indic language corpus
  • IndicNLP: For Hindi language resources and benchmarks
  • CDAC: For language technology tools and resources
  • IIT Madras: For Tamil language processing contributions
  • ISI Kolkata: For Bengali language datasets

🤝 Contributors & Community

  • Anktechsol Team: Core development and fine-tuning
  • Indian AI Research Community: Feedback and validation
  • Open Source Contributors: Bug fixes and improvements
  • Beta Testers: Early adopters who provided crucial feedback

🏢 Institutional Support

  • Qwen Team: For the excellent base model architecture
  • Hugging Face: For model hosting and distribution platform
  • Indian Language Technology Consortium: For linguistic resources

📖 Citation

If you use this model in your research or applications, please cite:

@misc{anki-qwen-2.5,
  title={Anki Qwen 2.5: An Indian Market-Centric Large Language Model},
  author={Anktechsol},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/anktechsol/anki-qwen-2.5}},
}

🚀 Ready to explore AI in Indian languages? Start using Anki Qwen 2.5 today!
Made with ❤️ for the Indian AI community

📋 Model Information

Attribute Value
Model Size 494M parameters
Base Model Qwen 2.5
Languages 12+ Indian languages + English
License MIT
Context Length 8K tokens
Precision F32
Training Data Indian-centric multilingual corpus
Use Cases Conversational AI, Content Generation, Market Analysis

For technical support, feature requests, or collaborations, please reach out through the Community discussions or contact anktechsol directly.