--- license: apache-2.0 language: - ar base_model: - aubmindlab/bert-base-arabertv01 pipeline_tag: token-classification library_name: transformers tags: - Hadith - Islam - Arabic - Isnad --- # About Ukhbert Ukhbert is a pair of BERT base transformer models for use with classical arabic texts. Specifically, it's used with reports composed of a isnad, or chain of narrators that transmit the report, and the matn, the actual content. Ukhbert is used to identify narrators within a isnad from the text of the report. ## An example of matn and isnad For example, consider the following text, which is a hadith or attribution to the Prophet Muhammad: حَدَّثَنَا قُتَيْبَةُ بْنُ سَعِيدٍ، حَدَّثَنَا عَبْدُ الْوَهَّابِ، قَالَ سَمِعْتُ يَحْيَى بْنَ سَعِيدٍ، يَقُولُ أَخْبَرَنِي مُحَمَّدُ بْنُ إِبْرَاهِيمَ، أَنَّهُ سَمِعَ عَلْقَمَةَ بْنَ وَقَّاصٍ اللَّيْثِيَّ، يَقُولُ سَمِعْتُ عُمَرَ بْنَ الْخَطَّابِ ـ رضى الله عنه ـ يَقُولُ سَمِعْتُ رَسُولَ اللَّهِ صلى الله عليه وسلم يَقُولُ ‏ "‏ إِنَّمَا الأَعْمَالُ بِالنِّيَّةِ، وَإِنَّمَا لاِمْرِئٍ مَا نَوَى، فَمَنْ كَانَتْ هِجْرَتُهُ إِلَى اللَّهِ وَرَسُولِهِ فَهِجْرَتُهُ إِلَى اللَّهِ وَرَسُولِهِ، وَمَنْ كَانَتْ هِجْرَتُهُ إِلَى دُنْيَا يُصِيبُهَا أَوِ امْرَأَةٍ يَتَزَوَّجُهَا، فَهِجْرَتُهُ إِلَى مَا هَاجَرَ إِلَيْهِ ‏‏.‏ Narrated `Umar bin Al-Khattab: I heard Allah's Messenger (ﷺ) saying, "The (reward of) deeds, depend upon the intentions and every person will get the reward according to what he has intended. So whoever emigrated for the sake of Allah and His Apostle, then his emigration will be considered to be for Allah and His Apostle, and whoever emigrated for the sake of worldly gain or for a woman to marry, then his emigration will be considered to be for what he emigrated for." Sahih al-Bukhari 6689 https://sunnah.com/bukhari:6689 The isnad is this: حَدَّثَنَا قُتَيْبَةُ بْنُ سَعِيدٍ، حَدَّثَنَا عَبْدُ الْوَهَّابِ، قَالَ سَمِعْتُ يَحْيَى بْنَ سَعِيدٍ، يَقُولُ أَخْبَرَنِي مُحَمَّدُ بْنُ إِبْرَاهِيمَ، أَنَّهُ سَمِعَ عَلْقَمَةَ بْنَ وَقَّاصٍ اللَّيْثِيَّ، يَقُولُ سَمِعْتُ عُمَرَ بْنَ الْخَطَّابِ Qutaybah bin Saeed narrated from Abdul Wahhab who said he heard Yahya bin Saeed who said it was reported to him from Muhammad bin Ibrahim who heard Alqamah bin Waqqaa al Laythi, who said he heard from Umar bin Khattab. The use of Ukhbert is to first identify the names narrators from the text, and then find out who they are. ## Ukhbert Narrator Linking This is the second half of the ukhbert pair. It links the narrator name to a ID from which the user can identify narrator information. For a comma separated list of names, where spaces between the names are replaced with '_', the model will output a text following this format: `L{Narrator_ID}` for each token it identifies with. ## Uses The model is intended to be used to find narrator ids based on the text of the narrator names. Once the narrrator ID is retrieved, we recommending using [this space](https://huggingface.co/spaces/FDSRashid/Narrator_Network_Retriever) to retrieve information about the identified narrator. It is strongly recomended to follow instructions on how to use the model in code, as this model is very delicate in how it predicts outputs. ### Out-of-Scope Use This model is strongly tied to the first half of ukhbert. It is recomended that they be used together. The utility functions that we recomend have not been generalized to work with other models, besides ours. Use with caution. ## How to Get Started with the Model The most successful way (likely the only successful way) to use this model is to tokenize the inputs, map the inputs to words, the predict the model, then aggregate them so that it maps to the proper names. We have thus far not found the HF pipeline to do this and get the outputs we want to. Therefore, do *not* use the pipeline for inference. Furthermore, the data must be preprocessed with a extra step, unlike ukhbert narrrator detection. FOr your list of narrators, you must have them as a string, separated by ', ' where spaces between the parts of the name have underscores. Here is one example: ``` ["قتيبة_بن_سعيد، عبد_الوهاب، يحيى_بن_سعيد، محمد_بن_إبراهيم، علقمة_بن_وقاص_الليثي، عمر_بن_الخطاب"] ``` You may have multiple lists of names, for multiple isnads. The list can be any number of elements. We have compiled code that has been the only way so far for us to get intended results. This is because one of the original depenedencies of this model was the `simpletransformers` package, which is no longer maintained. ### Requirements for working example The code we use only requires the `tranformers` library and the `httpimport` library. `httpimport` has no additional dependencies. We use it to import the utility functions that we have found works. ### Working example This is a working example with the model and the utility code. We *strongly* recommend using it. We have not found alternative code. However, feel free to edit it as see fit, or make a pull request to the Gist code if there are better means. ```python import httpimport url= "https://gist.githubusercontent.com/FDSRashid/55c14a8e9ba360b640cfca0b612ccd9a/raw/6ae62db2d8aef3691b8e1260899aebf428ed1371" with httpimport.remote_repo(url): # import module that contains utility function for predictions import narrator_link_utils narrator_link_utils.predict_narrators( "HikmaLabs/ukhbert_narrator_linking", to_predict = ["قتيبة_بن_سعيد، عبد_الوهاب، يحيى_بن_سعيد، محمد_بن_إبراهيم، علقمة_بن_وقاص_الليثي، عمر_بن_الخطاب"]) [[{'قتيبة_بن_سعيد،': 'L6460'}, {'عبد_الوهاب،': 'L5280'}, {'يحيى_بن_سعيد،': 'L8272'}, {'محمد_بن_إبراهيم،': 'L6796'}, {'علقمة_بن_وقاص_الليثي،': 'L5719'}, {'عمر_بن_الخطاب': 'L5913'}]], [[{'قتيبة_بن_سعيد،': [[np.float32(-2.8520968), np.float32(-2.7746296), np.float32(-2.0540233), np.float32(-1.2407739), np.float32(1.2954512), np.float32(-2.2372236), np.float32(-1.238617), np.float32(-2.3866634), np.float32(0.18253404), np.float32(-2.643002), np.float32(-2.656282), np.float32(-1.6206467), np.float32(-2.0487826), np.float32(-1.9577142), np.float32(-1.773613), np.float32(-0.9975122), np.float32(-0.38102615), np.float32(0.8699675), np.float32(-2.4253492), ... np.float32(-3.5996199), np.float32(-1.7653825), np.float32(-3.09321), np.float32(-3.2217002), ...]]}]]) ``` # Training Ukhbert is finetuned from the ARABERT v0.1 model. For more details, please consult the referenced paper, "Learning to identify Narrators in Classical Islamic Texts" # Citing If you use the model, please cite this paper: ```bibtex @article{ALKAOUD2021335, title = {Learning to Identify Narrators in Classical Arabic Texts}, journal = {Procedia Computer Science}, volume = {189}, pages = {335-342}, year = {2021}, note = {AI in Computational Linguistics}, issn = {1877-0509}, doi = {https://doi.org/10.1016/j.procs.2021.05.109}, url = {https://www.sciencedirect.com/science/article/pii/S1877050921012369}, author = {Mohamed Alkaoud and Mairaj Syed}, keywords = {NLP, Classical Arabic, Entity linking, Named-entity recognition, Digital humanities, Hadith science}, abstract = {One widespread historical method of transmitting and recording information about important events and people in the Middle East is the narration-based method. In this method, each saying about a person or event is transmitted from person to person until a systematic collector records and compiles such sayings in a stable collection. At each stage of transmission, the narrator not only transmits the saying but also the person he got it from going back to the earliest narrator. Identifying each narrator in these collections is important to better measure the accuracy of the narrations and identify the date and geographies of their circulation. In this work, we propose a natural language processing technique to automate the identification of narrators in classical Arabic texts. Our proposed technique consists of two models: 1) a model for detecting the narrators in the text, and 2) a model for linking narrators to their biographies. We train our two models on a large collection of annotated classical Arabic texts and achieve F1-scores of 96.15% and 95.74% for narration detection and linking respectively.} } ``` ## Model Card Authors Ferdaws Rashid ## Model Card Contact frashid@berkeley.edu