farnazzeidi commited on
Commit
9f8203b
·
verified ·
1 Parent(s): e9ec7fb

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +97 -0
  2. license.txt +8 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - tr
5
+ ---
6
+
7
+ # NER Model for Legal Texts
8
+
9
+ Released in January 2024, this is a Turkish BERT language model pretrained from scratch on an **optimized BERT architecture** using a 2 GB Turkish legal corpus. The corpus was sourced from legal-related thesis documents available in the Higher Education Board National Thesis Center (YÖKTEZ). The model has been fine-tuned for Named Entity Recognition (NER) tasks on human-annotated datasets provided by **NewMind**, a legal tech company in Istanbul, Turkey.
10
+
11
+ In our paper, we outline the steps taken to train this model and demonstrate its superior performance compared to previous approaches.
12
+
13
+ ---
14
+
15
+ ## Overview
16
+ - **Preprint Paper**: [https://arxiv.org/abs/2407.00648](https://arxiv.org/abs/2407.00648)
17
+ - **Architecture**: Optimized BERT Base
18
+ - **Language**: Turkish
19
+ - **Supported Labels**:
20
+ - `Person`
21
+ - `Law`
22
+ - `Publication`
23
+ - `Government`
24
+ - `Corporation`
25
+ - `Other`
26
+ - `Project`
27
+ - `Money`
28
+ - `Date`
29
+ - `Location`
30
+ - `Court`
31
+
32
+ **Model Name**: LegalLTurk Optimized BERT
33
+
34
+ ---
35
+
36
+ ## How to Use
37
+
38
+ ### Use a pipeline as a high-level helper
39
+ ```python
40
+ from transformers import pipeline
41
+
42
+ # Load the pipeline
43
+ model = pipeline("ner", model="farnazzeidi/ner-bert-law-model", aggregation_strategy='simple')
44
+
45
+ # Input text
46
+ text = "Burada, Tebligat Kanunu ile VUK düzenlemesi ayrımına dikkat etmek gerekir."
47
+
48
+ # Get predictions
49
+ predictions = model(text)
50
+ print(predictions)
51
+ ```
52
+
53
+
54
+ ### Load model directly
55
+ ```python
56
+ # Load model and tokenizer
57
+
58
+ tokenizer = AutoTokenizer.from_pretrained("farnazzeidi/ner-bert-law-model")
59
+ model = AutoModelForTokenClassification.from_pretrained("farnazzeidi/ner-bert-law-model")
60
+
61
+ text = "Burada, Tebligat Kanunu ile VUK düzenlemesi ayrımına dikkat etmek gerekir."
62
+ inputs = tokenizer(text, return_tensors="pt")
63
+ outputs = model(**inputs)
64
+
65
+ # Process logits and map predictions to labels
66
+ predictions = [
67
+ (token, model.config.id2label[label.item()])
68
+ for token, label in zip(
69
+ tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
70
+ torch.argmax(torch.softmax(outputs.logits, dim=-1), dim=-1)[0]
71
+ )
72
+ if token not in tokenizer.all_special_tokens
73
+ ]
74
+
75
+ print(predictions)
76
+ ```
77
+ ---
78
+ # Authors
79
+ Farnaz Zeidi, Mehmet Fatih Amasyali, Çigdem Erol
80
+
81
+ ---
82
+ # License
83
+
84
+ This model is licensed under the Apache 2.0 License.
85
+
86
+ Apache 2.0 License Summary
87
+
88
+ You are free to:
89
+
90
+ Use the model commercially and privately.
91
+ Modify and distribute the model.
92
+
93
+ However, you must:
94
+
95
+ Provide attribution to the authors and include the license notice in derivative works.
96
+
97
+ See the full Apache 2.0 License for more details: https://www.apache.org/licenses/LICENSE-2.0
license.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Copyright 2024 Farnaz Zeidi
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
+
5
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+
7
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8
+