LTS-VVE
/

Teuta

Text Generation

Not-For-All-Audiences

Model card Files Files and versions

Teuta / README.md

LTS-VVE's picture

Update README.md

3b50547 verified 11 days ago

|

history blame contribute delete

2.3 kB

	---
	license: apache-2.0
	datasets:
	- LTS-VVE/Teuta-sq
	- LTS-VVE/grammar_sq_0.1
	- LTS-VVE/linguistic_sq
	- LTS-VVE/Math-physics-dataset-sq
	- LTS-VVE/albanian-synthetic
	- noxneural/lilium_albanicum_eng_alb
	- MIND-Lab/Safety-Evaluation
	- shb777/simple-math-steps-7M
	- RishiKompelli/TherapyDataset
	- microsoft/orca-math-word-problems-200k
	- Vezora/Tested-143k-Python-Alpaca
	- AI4Chem/ChemPref-DPO-for-Chemistry-data-en
	- jkhedri/psychology-dataset
	- samhog/psychology-10k
	- Amod/mental_health_counseling_conversations
	- sayhan/strix-philosophy-qa
	- Maverfrick/Rust_dataset
	- Neloy262/rust_instruction_dataset
	- Tesslate/Rust_Dataset
	language:
	- en
	- sq
	base_model:
	- meta-llama/Llama-3.2-3B
	pipeline_tag: text-generation
	tags:
	- al
	- math
	- philosophy
	- chemistry
	- code
	- biology
	- climate
	- not-for-all-audiences
	---

	<p align="center">
	<span style="color:yellow">This model is not suitable for all audiences and may contain inappropriate or explicit content.</span>
	</p>

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/67b7476deb48853c39ca000b/CzUTg97aTxK283qwD6kEm.png" alt="Teuta Logo" />
	</p>

	# Teuta

	Teuta is a bilingual instruction-tuned language model designed for question answering in both Albanian (sq) and English (en). It is fine-tuned on a diverse mix of datasets covering subjects such as mathematics, philosophy, chemistry, biology, code (especially Rust), psychology, and climate science.

	## Model

	- Base model: meta-llama/Llama-3.2-3B
	- Languages: Albanian, English
	- Primary task: Instruction-following and question answering

	## Description

	Teuta is built to handle a variety of instructional prompts, from academic and scientific queries to more open-ended tasks. It is particularly suited for multilingual applications and under-resourced language support, with a strong focus on Albanian.

	The model leverages both synthetic and real datasets to improve generalization across technical and non-technical domains.

	## Considerations

	- Some datasets include sensitive content (e.g., mental health, therapy, and philosophical questions).
	- Outputs are not guaranteed to be factual or safe; use in sensitive contexts should be done with care.
	- Best suited for research, educational tools, and domain-specific applications.