Teuta / README.md
LTS-VVE's picture
Update README.md
3b50547 verified
metadata
license: apache-2.0
datasets:
  - LTS-VVE/Teuta-sq
  - LTS-VVE/grammar_sq_0.1
  - LTS-VVE/linguistic_sq
  - LTS-VVE/Math-physics-dataset-sq
  - LTS-VVE/albanian-synthetic
  - noxneural/lilium_albanicum_eng_alb
  - MIND-Lab/Safety-Evaluation
  - shb777/simple-math-steps-7M
  - RishiKompelli/TherapyDataset
  - microsoft/orca-math-word-problems-200k
  - Vezora/Tested-143k-Python-Alpaca
  - AI4Chem/ChemPref-DPO-for-Chemistry-data-en
  - jkhedri/psychology-dataset
  - samhog/psychology-10k
  - Amod/mental_health_counseling_conversations
  - sayhan/strix-philosophy-qa
  - Maverfrick/Rust_dataset
  - Neloy262/rust_instruction_dataset
  - Tesslate/Rust_Dataset
language:
  - en
  - sq
base_model:
  - meta-llama/Llama-3.2-3B
pipeline_tag: text-generation
tags:
  - al
  - math
  - philosophy
  - chemistry
  - code
  - biology
  - climate
  - not-for-all-audiences

This model is not suitable for all audiences and may contain inappropriate or explicit content.

Teuta Logo

Teuta

Teuta is a bilingual instruction-tuned language model designed for question answering in both Albanian (sq) and English (en). It is fine-tuned on a diverse mix of datasets covering subjects such as mathematics, philosophy, chemistry, biology, code (especially Rust), psychology, and climate science.

Model

  • Base model: meta-llama/Llama-3.2-3B
  • Languages: Albanian, English
  • Primary task: Instruction-following and question answering

Description

Teuta is built to handle a variety of instructional prompts, from academic and scientific queries to more open-ended tasks. It is particularly suited for multilingual applications and under-resourced language support, with a strong focus on Albanian.

The model leverages both synthetic and real datasets to improve generalization across technical and non-technical domains.

Considerations

  • Some datasets include sensitive content (e.g., mental health, therapy, and philosophical questions).
  • Outputs are not guaranteed to be factual or safe; use in sensitive contexts should be done with care.
  • Best suited for research, educational tools, and domain-specific applications.