---
language: fa
license: mit
tags:
  - keyword-extraction
  - persian
  - farsi
  - token-classification
  - xlm-roberta
  - nlp
datasets:
  - custom
metrics:
  - precision
  - recall
  - f1
widget:
  - text: "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد."
---

# Model Datacard: Persian Keyword Extraction Model

## Model Details
- **Model Name**: keyword_Roberta_large_per
- **Base Model**: xlm-roberta-large
- **Task**: Keyword Extraction
- **Language**: Persian (Farsi)
- **Developer**: PakdamanAli
- **Model Version**: 1.0.0

## Intended Use
This model is designed to extract keywords from Persian text. It can be used for:
- Automatic tagging of content
- Search engine optimization
- Content categorization
- Topic modeling
- Information retrieval enhancement

### Primary Intended Uses
- Content analysis for Persian websites
- Academic research on Persian text
- Information extraction systems

### Out-of-Scope Use Cases
- Translation services
- Text summarization
- Persian named entity recognition (unless specifically trained for this)
- Other NLP tasks beyond keyword extraction

## Training Data
- **Dataset Size**: 40,000 Persian text samples
- **Data Preparation**: Fine-tuned on xlm-roberta-large

## Performance Evaluation
Metrics and evaluation results will be published in a future update.

## Limitations
- The model may not perform well on domain-specific content that was not represented in the training data
- Performance may vary for very short or extremely long texts
- The model may occasionally extract words that are not truly "key" to the content
- Dialect variations in Persian might affect extraction quality

## Ethical Considerations
- The model is trained on Persian text and may reflect biases present in that content
- Users should verify extracted keywords for sensitive content before implementing in automated systems
- The model should not be used to extract or analyze personally identifiable information without proper consent

## Technical Specifications
- **Input**: Persian text (UTF-8 encoded)
- **Output**: List of extracted keywords
- **Framework**: Transformers (Hugging Face)
- **Requirements**: PyTorch, Transformers

## Pipeline Usage
To use this model with the Hugging Face pipeline:

```python
from transformers import pipeline

# Initialize the pipeline
keyword_extractor = pipeline(
    task="token-classification",
    model="PakdamanAli/keyword_Roberta_large_per",
    tokenizer="PakdamanAli/keyword_Roberta_large_per"
)

# Example usage
text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد."
keywords = keyword_extractor(text)

# Process the results based on the model output format
# Example: extracted_keywords = [item["word"] for item in keywords]
```

## Example
```python
from transformers import pipeline

extractor = pipeline(
    task="token-classification",
    model="PakdamanAli/keyword_Roberta_large_per",
    tokenizer="PakdamanAli/keyword_Roberta_large_per"
)

text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد."
results = extractor(text)

# Extract just the words from the results
keywords = [item["word"] for item in results]
print(keywords)
```