๐Ÿ‡ฎ๐Ÿ‡ณ Anki Qwen 2.5 - Indian Market-Centric LLM

Languages Base Model Model Size License

๐Ÿš€ Model Overview

Anki Qwen 2.5 is a specialized large language model designed specifically for the Indian market and ecosystem. Built upon the robust Qwen 2.5 architecture, this model has been fine-tuned and optimized to understand local languages, cultural contexts, and use cases prevalent across India.

This model bridges the gap between global AI capabilities and local Indian needs, offering enhanced performance in:

  • Indic Language Understanding: Deep comprehension of Hindi, Bengali, Tamil, Telugu, Urdu, Gujarati, Kannada, Malayalam, Punjabi, Odia, Assamese, and Marathi
  • Cultural Context Awareness: Understanding of Indian customs, festivals, traditions, and social dynamics
  • Market-Specific Applications: Tailored for Indian business scenarios, educational contexts, and daily life interactions

โœจ Key Features

๐ŸŒ Indic Language Excellence

  • Multi-script Support: Handles Devanagari, Bengali, Tamil, Telugu, Urdu, Gujarati, and other Indian scripts
  • Code-mixing Capability: Seamlessly processes Hinglish and other Indian English variants
  • Regional Dialects: Understanding of regional variations and colloquialisms

๐Ÿ’ฌ Advanced Conversational Ability

  • Contextual Conversations: Maintains context across long dialogues in multiple languages
  • Cultural Sensitivity: Responds appropriately to Indian cultural references and contexts
  • Formal & Informal Registers: Adapts tone based on conversation requirements

๐ŸŽฏ Market Specificity

  • Indian Business Context: Understanding of Indian market dynamics, regulations, and practices
  • Educational Alignment: Aligned with Indian educational curricula and learning patterns
  • Rural-Urban Bridge: Capable of addressing both urban and rural use cases effectively

๐Ÿ”ง Technical Details

Architecture

  • Base Model: Qwen 2.5 (0.5B parameters)
  • Fine-tuning: Specialized training on Indian datasets
  • Model Size: 494M parameters
  • Precision: F32 tensor type
  • Context Length: Up to 8K tokens

Training Data

  • Indic Corpus: Comprehensive collection from AI4Bharat
  • Hindi Literature: Classical and contemporary Hindi texts
  • Multilingual Datasets: Balanced representation across 12+ Indian languages
  • Domain-Specific Data: Business, education, healthcare, and government domains
  • Cultural Content: Festivals, traditions, mythology, and historical references

Licensing

  • Weights: Open weights under MIT License
  • Commercial Use: Permitted with attribution
  • Research Use: Fully open for academic and research purposes

๐ŸŽฏ Use Cases

๐ŸŽฌ Hindi/Indian Language Content Creation

# Generate Hindi poetry or stories
response = model.generate(
    "เคนเคฟเค‚เคฆเฅ€ เคฎเฅ‡เค‚ เคเค• เคธเฅเค‚เคฆเคฐ เค•เคตเคฟเคคเคพ เคฒเคฟเค–เฅ‡เค‚ เคนเฅ‹เคฒเฅ€ เค•เฅ‡ เคฌเคพเคฐเฅ‡ เคฎเฅ‡เค‚",
    max_length=200
)

๐Ÿ“Š Market Analysis & Business Intelligence

  • Indian market trend analysis
  • Customer sentiment analysis in local languages
  • Regional business strategy recommendations
  • Compliance and regulatory guidance

๐ŸŒพ Rural Technology Enablement

  • Agricultural advisory in local languages
  • Government scheme explanations
  • Digital literacy support
  • Local language interfaces for apps

๐ŸŽ“ Educational Support

  • Multilingual tutoring assistance
  • Curriculum-aligned content generation
  • Language learning support
  • Cultural education resources

๐Ÿ’ผ Enterprise Applications

  • Customer support in regional languages
  • Document translation and summarization
  • Indian law and regulation interpretation
  • HR and recruitment assistance

๐Ÿ› ๏ธ How to Use

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model and tokenizer
model_name = "anktechsol/anki-qwen-2.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float32,
    device_map="auto"
)

# Generate text in Hindi
prompt = "เคญเคพเคฐเคค เคฎเฅ‡เค‚ AI เค•เคพ เคญเคตเคฟเคทเฅเคฏ"
inputs = tokenizer.encode(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_length=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage

# Multi-language conversation
conversation = [
    {"role": "user", "content": "เคฎเฅเคเฅ‡ เค…เคชเคจเฅ‡ เคฌเคฟเคœเคจเฅ‡เคธ เค•เฅ‡ เคฒเคฟเค เคเค• เคฎเคพเคฐเฅเค•เฅ‡เคŸเคฟเค‚เค— เคธเฅเคŸเฅเคฐเฅˆเคŸเฅ‡เคœเฅ€ เคšเคพเคนเคฟเคเฅค"},
]

# Apply chat template
formatted_prompt = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True
)

# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Integration with Popular Frameworks

# Using with LangChain for Indian applications
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import pipeline

# Create pipeline
pipe = pipeline(
    "text-generation",
    model="anktechsol/anki-qwen-2.5",
    tokenizer="anktechsol/anki-qwen-2.5",
    max_length=512
)

# Wrap with LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# Use in your Indian language applications
response = llm("Explain GST rules in Hindi")

๐Ÿค Community & Contributions

๐Ÿ“ข Call to Action

We invite the Indian AI community to:

  • ๐Ÿ”ฌ Experiment: Try the model with your specific use cases and share results
  • ๐Ÿ“ Feedback: Report performance insights, especially for regional languages
  • ๐ŸŒ Language Expansion: Help us improve coverage for underrepresented Indian languages
  • ๐Ÿค Collaborate: Contribute training data, evaluation benchmarks, or model improvements
  • ๐Ÿ“š Research: Use this model as a foundation for Indian language research

๐Ÿ’ฌ Community Channels

  • Discussions: Use the Community tab above for questions and suggestions
  • Issues: Report bugs or request features in our repository
  • Research: Cite this model in your academic work and share findings

๐ŸŽฏ Specific Areas Seeking Community Input

  • Regional Dialects: Help improve understanding of local variations
  • Domain Expertise: Contribute specialized knowledge (legal, medical, technical)
  • Evaluation Metrics: Develop Indian language-specific benchmarks
  • Cultural Nuances: Enhance cultural context understanding

๐Ÿ™ Acknowledgments

๐Ÿ“Š Datasets & Resources

  • AI4Bharat: For the comprehensive Indic language corpus
  • IndicNLP: For Hindi language resources and benchmarks
  • CDAC: For language technology tools and resources
  • IIT Madras: For Tamil language processing contributions
  • ISI Kolkata: For Bengali language datasets

๐Ÿค Contributors & Community

  • Anktechsol Team: Core development and fine-tuning
  • Indian AI Research Community: Feedback and validation
  • Open Source Contributors: Bug fixes and improvements
  • Beta Testers: Early adopters who provided crucial feedback

๐Ÿข Institutional Support

  • Qwen Team: For the excellent base model architecture
  • Hugging Face: For model hosting and distribution platform
  • Indian Language Technology Consortium: For linguistic resources

๐Ÿ“– Citation

If you use this model in your research or applications, please cite:

@misc{anki-qwen-2.5,
  title={Anki Qwen 2.5: An Indian Market-Centric Large Language Model},
  author={Anktechsol},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/anktechsol/anki-qwen-2.5}},
}

๐Ÿš€ Ready to explore AI in Indian languages? Start using Anki Qwen 2.5 today!
Made with โค๏ธ for the Indian AI community

๐Ÿ“‹ Model Information

Attribute Value
Model Size 494M parameters
Base Model Qwen 2.5
Languages 12+ Indian languages + English
License MIT
Context Length 8K tokens
Precision F32
Training Data Indian-centric multilingual corpus
Use Cases Conversational AI, Content Generation, Market Analysis

For technical support, feature requests, or collaborations, please reach out through the Community discussions or contact anktechsol directly.

Downloads last month
38
Safetensors
Model size
494M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for anktechsol/anki-qwen-2.5

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(372)
this model
Quantizations
2 models

Evaluation results