language: pt
license: mit
tags:
- sentence-transformers
LegalBERTPT-br
LegalBERTPT-br is a trained sentence embedding using SimCSE, a contrastive learning framework, coupled with the Portuguese pre-trained language model named BERTimbau.
Corpora
– From this site, we used the column Conteudo
with 215,713 comments. We removed the comments from PL 3723/2019, PEC 471/2005, and Hashtag Corpus, in order to avoid bias.
– From this site, we also used 147,008 bills. From these projects, we used the summary field named txtEmenta
and the project core text named txtExplicacaoEmenta
.
– From Political Speeches, we used 462,831 texts, specifically, we used the columns: sumario
, textodiscurso
, and indexacao
.
These corpora were segmented into sentences and concatenated, producing 2,307,426 sentences.
Citing and Authors
This model was trained by sentence-transformers.
If you find this model helpful, feel free to cite our publication Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies:
@inproceedings{bracis,
author = {Nádia Silva and Marília Silva and Fabíola Pereira and João Tarrega and João Beinotti and Márcio Fonseca and Francisco Andrade and André Carvalho},
title = {Evaluating Topic Models in Portuguese Political Comments About Bills from Brazil’s Chamber of Deputies},
booktitle = {Anais da X Brazilian Conference on Intelligent Systems},
location = {Online},
year = {2021},
keywords = {},
issn = {0000-0000},
publisher = {SBC},
address = {Porto Alegre, RS, Brasil},
url = {https://sol.sbc.org.br/index.php/bracis/article/view/19061}
}