Text Generation
PyTorch
GGUF
English
Albanian
llama
al
math
philosophy
chemistry
code
biology
climate
Not-For-All-Audiences
conversational
license: apache-2.0 | |
datasets: | |
- LTS-VVE/Teuta-sq | |
- LTS-VVE/grammar_sq_0.1 | |
- LTS-VVE/linguistic_sq | |
- LTS-VVE/Math-physics-dataset-sq | |
- LTS-VVE/albanian-synthetic | |
- noxneural/lilium_albanicum_eng_alb | |
- MIND-Lab/Safety-Evaluation | |
- shb777/simple-math-steps-7M | |
- RishiKompelli/TherapyDataset | |
- microsoft/orca-math-word-problems-200k | |
- Vezora/Tested-143k-Python-Alpaca | |
- AI4Chem/ChemPref-DPO-for-Chemistry-data-en | |
- jkhedri/psychology-dataset | |
- samhog/psychology-10k | |
- Amod/mental_health_counseling_conversations | |
- sayhan/strix-philosophy-qa | |
- Maverfrick/Rust_dataset | |
- Neloy262/rust_instruction_dataset | |
- Tesslate/Rust_Dataset | |
language: | |
- en | |
- sq | |
base_model: | |
- meta-llama/Llama-3.2-3B | |
pipeline_tag: text-generation | |
tags: | |
- al | |
- math | |
- philosophy | |
- chemistry | |
- code | |
- biology | |
- climate | |
- not-for-all-audiences | |
<p align="center"> | |
<span style="color:yellow">This model is not suitable for all audiences and may contain inappropriate or explicit content.</span> | |
</p> | |
<p align="center"> | |
<img src="https://cdn-uploads.huggingface.co/production/uploads/67b7476deb48853c39ca000b/CzUTg97aTxK283qwD6kEm.png" alt="Teuta Logo" /> | |
</p> | |
# Teuta | |
Teuta is a bilingual instruction-tuned language model designed for question answering in both Albanian (sq) and English (en). It is fine-tuned on a diverse mix of datasets covering subjects such as mathematics, philosophy, chemistry, biology, code (especially Rust), psychology, and climate science. | |
## Model | |
- **Base model**: meta-llama/Llama-3.2-3B | |
- **Languages**: Albanian, English | |
- **Primary task**: Instruction-following and question answering | |
## Description | |
Teuta is built to handle a variety of instructional prompts, from academic and scientific queries to more open-ended tasks. It is particularly suited for multilingual applications and under-resourced language support, with a strong focus on Albanian. | |
The model leverages both synthetic and real datasets to improve generalization across technical and non-technical domains. | |
## Considerations | |
- Some datasets include sensitive content (e.g., mental health, therapy, and philosophical questions). | |
- Outputs are not guaranteed to be factual or safe; use in sensitive contexts should be done with care. | |
- Best suited for research, educational tools, and domain-specific applications. |