Model Datacard: Persian Keyword Extraction Model
Model Details
- Model Name: keyword_Roberta_base_per
- Base Model: xlm-roberta-large
- Task: Keyword Extraction
- Language: Persian (Farsi)
- Developer: PakdamanAli
- Model Version: 1.0.0
Intended Use
This model is designed to extract keywords from Persian text. It can be used for:
- Automatic tagging of content
- Search engine optimization
- Content categorization
- Topic modeling
- Information retrieval enhancement
Primary Intended Uses
- Content analysis for Persian websites
- Academic research on Persian text
- Information extraction systems
Out-of-Scope Use Cases
- Translation services
- Text summarization
- Persian named entity recognition (unless specifically trained for this)
- Other NLP tasks beyond keyword extraction
Training Data
- Dataset Size: 40,000 Persian text samples
- Data Preparation: Fine-tuned on xlm-roberta-large
Performance Evaluation
Metrics and evaluation results will be published in a future update.
Limitations
- The model may not perform well on domain-specific content that was not represented in the training data
- Performance may vary for very short or extremely long texts
- The model may occasionally extract words that are not truly "key" to the content
- Dialect variations in Persian might affect extraction quality
Ethical Considerations
- The model is trained on Persian text and may reflect biases present in that content
- Users should verify extracted keywords for sensitive content before implementing in automated systems
- The model should not be used to extract or analyze personally identifiable information without proper consent
Technical Specifications
- Input: Persian text (UTF-8 encoded)
- Output: List of extracted keywords
- Framework: Transformers (Hugging Face)
- Requirements: PyTorch, Transformers
Pipeline Usage
To use this model with the Hugging Face pipeline:
from transformers import pipeline
# Initialize the pipeline
keyword_extractor = pipeline(
task="token-classification",
model="PakdamanAli/keyword_Roberta_base_per",
tokenizer="PakdamanAli/keyword_Roberta_base_per"
)
# Example usage
text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبههای گردشگری فراوان میباشد."
keywords = keyword_extractor(text)
# Process the results based on the model output format
# Example: extracted_keywords = [item["word"] for item in keywords]
Example
from transformers import pipeline
extractor = pipeline(
task="token-classification",
model="PakdamanAli/keyword_Roberta_base_per",
tokenizer="PakdamanAli/keyword_Roberta_base_per"
)
text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبههای گردشگری فراوان میباشد."
results = extractor(text)
# Extract just the words from the results
keywords = [item["word"] for item in results]
print(keywords)
- Downloads last month
- 26
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.