legal-bert-pt-br / README.md
augusnunes's picture
Adding SentenceTransformers's LegalBERTPT-br and README
4a5a0aa
|
raw
history blame
2 kB
metadata
language: pt
license: mit
tags:
  - sentence-transformers

LegalBERTPT-br

LegalBERTPT-br is a trained sentence embedding using SimCSE, a contrastive learning framework, coupled with the Portuguese pre-trained language model named BERTimbau.

Corpora

– From this site, we used the column Conteudo with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias. – From this site, we also used 147,008 bills. From these projects, we used the summary field named txtEmenta and the project core text named txtExplicacaoEmenta. – From Political Speeches, we used 462,831 texts, specifically, we used the columns: sumario, textodiscurso, and indexacao.

These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.

Citing and Authors

This model was trained by sentence-transformers.

If you find this model helpful, feel free to cite our publication Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies:

@inproceedings{bracis,
 author = {Nádia Silva and Marília Silva and Fabíola Pereira and João Tarrega and João Beinotti and Márcio Fonseca and Francisco Andrade and André Carvalho},
 title = {Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies},
 booktitle = {Anais da X Brazilian Conference on Intelligent Systems},
 location = {Online},
 year = {2021},
 keywords = {},
 issn = {0000-0000},
 publisher = {SBC},
 address = {Porto Alegre, RS, Brasil},
 url = {https://sol.sbc.org.br/index.php/bracis/article/view/19061}
}