tags:
- sentence-transformers
- sentence-similarity
- information-retrieval
- semantic-search
widget:
- source_sentence: >-
Descrivi dettagliatamente il processo chimico e fisico che avviene durante
la preparazione di un impasto per crostata
sentences:
- >-
## La Magia Chimica e Fisica nell'Impasto della Crostata: Un Viaggio
Dagli Ingredienti Secchi al Trionfo del Forno
La preparazione di una crostata, apparentemente un gesto semplice e
familiare, cela in realtà un affascinante balletto di reazioni chimiche
e trasformazioni fisiche...
- >-
## L'Arte Effimera: Creare un Dolce Paesaggio Invernale
Immergiamoci nel cuore pulsante della pasticceria festiva, dove l'arte
culinaria si fonde con la creatività artistica...
- >-
Le piattaforme di comunicazione digitale, con la loro ubiquità
crescente, si configurano come un'arma a doppio taglio nel panorama
sociale contemporaneo...
pipeline_tag: sentence-similarity
library_name: sentence-transformers
Fine-tuned Qwen3-Embedding for Italian-English Cross-Lingual Semantic Retrieval
This model is a specialized fine-tuned version of Qwen/Qwen3-Embedding-0.6B optimized for cross-lingual semantic retrieval tasks, with particular emphasis on Italian query understanding and multilingual document ranking.
Model Description
- Model Type: Dense embedding model for semantic retrieval
- Base Model: Qwen/Qwen3-Embedding-0.6B
- Output Dimensionality: 1,024-dimensional dense vectors
- Maximum Sequence Length: 32,768 tokens
- Primary Languages: Italian, English
- Similarity Function: Cosine similarity
Capabilities
Cross-Lingual Retrieval
The model demonstrates strong performance in matching Italian queries to English documents and vice versa, particularly effective in technical and academic domains.
Domain Coverage
Trained on diverse knowledge domains including:
- Medical & Health Sciences: Diagnostic imaging, clinical procedures, medical terminology
- STEM Fields: Physics, computer science, geology, engineering
- Professional Domains: Finance, law, agriculture, software development
- Educational Content: Historical studies, culinary arts, general knowledge
Query Understanding
Enhanced comprehension of:
- Conversational and informal query patterns
- Technical terminology across domains
- Cross-lingual semantic concepts
- Complex multi-faceted questions
Training Data
The model was fine-tuned on a curated corpus of Italian-English cross-lingual data, featuring high-quality triplets designed to capture semantic nuances across multiple domains. The dataset emphasizes:
- Hard negative mining: Strategic inclusion of semantically related but incorrect documents
- Cross-lingual alignment: Balanced representation of Italian-English language pairs
- Domain diversity: Comprehensive coverage of academic, professional, and conversational contexts
- Quality curation: Manual review and automated filtering for coherence and relevance
Usage
Basic Retrieval
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("your-model-name")
# Cross-lingual query-document matching
query = "Come si distingue una faglia trascorrente da una normale?"
documents = [
"Strike-slip faults are characterized by horizontal movement...",
"Normal faults occur due to extensional stress...",
"Investment portfolio management strategies..."
]
query_embedding = model.encode(query, prompt="Represent this search query for finding relevant passages: ")
doc_embeddings = model.encode(documents, prompt="Represent this passage for retrieval: ")
similarities = model.similarity(query_embedding, doc_embeddings)
Prompt Templates
The model is optimized for specific prompt templates:
- Queries:
"Represent this search query for finding relevant passages: " - Documents:
"Represent this passage for retrieval: "
Applications
- Cross-lingual information retrieval systems
- Academic and technical document search
- Multilingual question-answering platforms
- Educational content recommendation
- Professional knowledge base systems
Limitations
- Language coverage: Primarily optimized for Italian-English pairs
- Domain specificity: Performance may vary on highly specialized domains not represented in training
- Cultural context: Reflects primarily Western/European knowledge perspectives
- Computational requirements: Dense representations require significant storage for large-scale deployment
Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 32768, 'architecture': 'Qwen3Model'})
(1): Pooling({'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
Citation
@misc{qwen3-italian-retrieval-2024,
title={Fine-tuned Qwen3-Embedding for Italian-English Cross-Lingual Semantic Retrieval},
year={2024},
howpublished={\\url{https://huggingface.co/your-model-name}}
}
Acknowledgments
This work builds upon the Qwen3-Embedding architecture and advances in contrastive learning for dense retrieval. We acknowledge the contributions of the Qwen team and the sentence-transformers community.
License: Inherits licensing terms from the base Qwen/Qwen3-Embedding-0.6B model.