hooshafzar
/

SINA-BERT

Model card Files Files and versions

SINA-BERT / README.md

Nasrin-Taghizadeh's picture

Nasrin-Taghizadeh

Update README.md

f05b0a1 verified 5 months ago

|

history blame contribute delete

2.49 kB

	---
	license: lgpl-3.0
	language:
	- fa
	base_model:
	- HooshvareLab/bert-base-parsbert-uncased
	---
	# SINA-BERT: A Pre-trained Language Model for Analysis of Medical Texts in Persian

	SINA-BERT is the first Persian medical language model pre-trained on BERT (Devlin et al.,2018). SINA-BERT utilizes pre-training on a large-scale corpus of medical contents including formal and informal texts collected from a variety of online resources in order to improve the performance on health-care related tasks.


	## Model Evaluation
	SINA-BERT can be used for any Persian medical representative task. In our paper we have examined the followings:

	1) categorization of medical questions,
	2) medical sentiment analysis,
	3) and medical question retrieval.

	For each task, we have developed Persian annotated data sets, and learnt a representation for the data of each task especially complex and long medical questions. With the same architecture being used across tasks, SINA-BERT outperforms BERT-based models that were previously made available in the Persian language.

	To read about the datasets and results, please refer to SINA-BERT paper: [arXiv:2104.07613v1](https://arxiv.org/pdf/2104.07613)


	- Developed by: HooshAfzar Salamat Team
	- Language(s) (NLP): Persian
	- Finetuned from model: [ParsBert](https://huggingface.co/HooshvareLab/bert-base-parsbert-uncased)

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [GitHub](https://github.com/nasrin-taghizadeh/SinaBERT)
	- Paper [optional]: [arXive paper](https://arxiv.org/pdf/2104.07613)

	## How to use

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	```
	from transformers import AutoConfig, AutoTokenizer, AutoModel

	config = AutoConfig.from_pretrained("hooshafzar/SINA-BERT")
	tokenizer = AutoTokenizer.from_pretrained("hooshafzar/SINA-BERT")
	model = AutoModel.from_pretrained("hooshafzar/SINA-BERT")

	```


	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->


	```bibtex
	@article{taghizadeh2021sina,
	title={SINA-BERT: a pre-trained language model for analysis of medical texts in Persian},
	author={Taghizadeh, Nasrin and Doostmohammadi, Ehsan and Seifossadat, Elham and Rabiee, Hamid R and Tahaei, Maedeh S},
	journal={arXiv preprint arXiv:2104.07613},
	year={2021}
	}
	```