--- language: fa license: mit tags: - keyword-extraction - persian - farsi - token-classification - xlm-roberta - nlp datasets: - custom metrics: - precision - recall - f1 widget: - text: "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد." --- # Model Datacard: Persian Keyword Extraction Model ## Model Details - **Model Name**: keyword_Roberta_large_per - **Base Model**: xlm-roberta-large - **Task**: Keyword Extraction - **Language**: Persian (Farsi) - **Developer**: PakdamanAli - **Model Version**: 1.0.0 ## Intended Use This model is designed to extract keywords from Persian text. It can be used for: - Automatic tagging of content - Search engine optimization - Content categorization - Topic modeling - Information retrieval enhancement ### Primary Intended Uses - Content analysis for Persian websites - Academic research on Persian text - Information extraction systems ### Out-of-Scope Use Cases - Translation services - Text summarization - Persian named entity recognition (unless specifically trained for this) - Other NLP tasks beyond keyword extraction ## Training Data - **Dataset Size**: 40,000 Persian text samples - **Data Preparation**: Fine-tuned on xlm-roberta-large ## Performance Evaluation Metrics and evaluation results will be published in a future update. ## Limitations - The model may not perform well on domain-specific content that was not represented in the training data - Performance may vary for very short or extremely long texts - The model may occasionally extract words that are not truly "key" to the content - Dialect variations in Persian might affect extraction quality ## Ethical Considerations - The model is trained on Persian text and may reflect biases present in that content - Users should verify extracted keywords for sensitive content before implementing in automated systems - The model should not be used to extract or analyze personally identifiable information without proper consent ## Technical Specifications - **Input**: Persian text (UTF-8 encoded) - **Output**: List of extracted keywords - **Framework**: Transformers (Hugging Face) - **Requirements**: PyTorch, Transformers ## Pipeline Usage To use this model with the Hugging Face pipeline: ```python from transformers import pipeline # Initialize the pipeline keyword_extractor = pipeline( task="token-classification", model="PakdamanAli/keyword_Roberta_large_per", tokenizer="PakdamanAli/keyword_Roberta_large_per" ) # Example usage text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد." keywords = keyword_extractor(text) # Process the results based on the model output format # Example: extracted_keywords = [item["word"] for item in keywords] ``` ## Example ```python from transformers import pipeline extractor = pipeline( task="token-classification", model="PakdamanAli/keyword_Roberta_large_per", tokenizer="PakdamanAli/keyword_Roberta_large_per" ) text = "ایران کشوری با تاریخ و فرهنگ غنی است که دارای جاذبه‌های گردشگری فراوان می‌باشد." results = extractor(text) # Extract just the words from the results keywords = [item["word"] for item in results] print(keywords) ```