felipemaiapolo
/

legalnlp-bert

Inference Endpoints

Model card Files Files and versions Community

felipemaiapolo commited on Jun 12, 2023

Commit

1bd0fd9

·

1 Parent(s): 120b13a

Create README.md

Files changed (1) hide show

README.md +62 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+license: mit
+language:
+- pt
+---
+# BERTikal (aka `legalnlp-bert`)
+BERTikal [1] is a cased BERT-base model for the Brazilian legal language and was trained from the BERTimbau's [2] checkpoint using Brazilian legal texts. More details on the datasets and training procedures can be found in [1].
+## Usage
+```python
+from transformers import AutoTokenizer  # Or BertTokenizer
+from transformers import AutoModelForPreTraining  # Or BertForPreTraining for loading pretraining heads
+from transformers import AutoModel  # or BertModel, for BERT without pretraining heads
+model = AutoModelForPreTraining.from_pretrained('felipemaiapolo/legalnlp-bert')
+tokenizer = AutoTokenizer.from_pretrained('felipemaiapolo/legalnlp-bert', do_lower_case=False)
+```
+### Ex. extracting BERT embeddings
+```python
+import torch
+model = AutoModel.from_pretrained('felipemaiapolo/legalnlp-bert')
+input_ids = tokenizer.encode('Tinha uma pedra no meio do caminho.', return_tensors='pt')
+with torch.no_grad():
+    outs = model(input_ids)
+    encoded = outs[0][0, 1:-1]  # Ignore [CLS] and [SEP] special tokens
+# encoded.shape: (8, 768)
+# tensor([[-0.0398, -0.3057,  0.2431,  ..., -0.5420,  0.1857, -0.5775],
+#         [-0.2926, -0.1957,  0.7020,  ..., -0.2843,  0.0530, -0.4304],
+#         [ 0.2463, -0.1467,  0.5496,  ...,  0.3781, -0.2325, -0.5469],
+#         ...,
+#         [ 0.0662,  0.7817,  0.3486,  ..., -0.4131, -0.2852, -0.2819],
+#         [ 0.0662,  0.2845,  0.1871,  ..., -0.2542, -0.2933, -0.0661],
+#         [ 0.2761, -0.1657,  0.3288,  ..., -0.2102,  0.0029, -0.2009]])
+```
+# Cite
+  Polo, Felipe Maia, et al. "LegalNLP-Natural Language Processing methods for the Brazilian Legal Language." Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional. SBC, 2021.
+    @inproceedings{polo2021legalnlp,
+      title={LegalNLP-Natural Language Processing methods for the Brazilian Legal Language},
+      author={Polo, Felipe Maia and Mendon{\c{c}}a, Gabriel Caiaffa Floriano and Parreira, Kau{\^e} Capellato J and Gianvechio, Lucka and Cordeiro, Peterson and Ferreira, Jonathan Batista and de Lima, Leticia Maria Paz and do Amaral Maia, Ant{\^o}nio Carlos and Vicente, Renato},
+      booktitle={Anais do XVIII Encontro Nacional de Intelig{\^e}ncia Artificial e Computacional},
+      pages={763--774},
+      year={2021},
+      organization={SBC}
+    }
+# References
+[1] Polo, Felipe Maia, et al. "LegalNLP-Natural Language Processing methods for the Brazilian Legal Language." Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional. SBC, 2021.
+[2] Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT
+models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent
+Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23