higopires commited on
Commit
159e956
·
verified ·
1 Parent(s): 04de58c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -36
README.md CHANGED
@@ -1,53 +1,104 @@
1
- ---
2
- tags:
3
- - generated_from_trainer
4
- model-index:
5
- - name: DeB3RTa_3
6
- results: []
7
- ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
 
12
- # DeB3RTa_3
13
 
14
- This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
 
16
- ## Model description
 
17
 
18
- More information needed
19
 
20
- ## Intended uses & limitations
 
 
 
 
21
 
22
- More information needed
23
 
24
- ## Training and evaluation data
25
 
26
- More information needed
 
 
 
 
 
27
 
28
- ## Training procedure
29
 
30
- ### Training hyperparameters
 
 
 
 
 
31
 
32
- The following hyperparameters were used during training:
33
- - learning_rate: 0.0001
34
- - train_batch_size: 192
35
- - eval_batch_size: 8
36
- - seed: 42
37
- - gradient_accumulation_steps: 8
38
- - total_train_batch_size: 1536
39
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
40
- - lr_scheduler_type: linear
41
- - lr_scheduler_warmup_ratio: 0.01
42
- - num_epochs: 50.0
43
 
44
- ### Training results
 
45
 
 
 
 
46
 
 
 
 
 
 
47
 
48
- ### Framework versions
49
 
50
- - Transformers 4.30.2
51
- - Pytorch 2.2.0+cu121
52
- - Datasets 2.13.1
53
- - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain
 
 
 
 
 
 
2
 
3
+ DeB3RTa is a family of transformer-based language models specifically designed for Portuguese financial text processing. These models are built on the DeBERTa-v2 architecture and trained using a comprehensive mixed-domain pretraining strategy that combines financial, political, business management, and accounting corpora.
 
4
 
5
+ ## Model Variants
6
 
7
+ Two variants are available:
8
 
9
+ - **DeB3RTa-base**: 12 attention heads, 12 layers, intermediate size of 3072, hidden size of 768 (~426M parameters)
10
+ - **DeB3RTa-small**: 6 attention heads, 12 layers, intermediate size of 1536, hidden size of 384 (~70M parameters)
11
 
12
+ ## Key Features
13
 
14
+ - First Portuguese financial domain-specific transformer model
15
+ - Mixed-domain pretraining incorporating finance, politics, business, and accounting texts
16
+ - Enhanced performance on financial NLP tasks compared to general-domain models
17
+ - Resource-efficient architecture with strong performance-to-parameter ratio
18
+ - Advanced fine-tuning techniques including layer reinitialization, mixout regularization, and layer-wise learning rate decay
19
 
20
+ ## Performance
21
 
22
+ The models have been evaluated on multiple financial domain tasks:
23
 
24
+ | Task | Dataset | DeB3RTa-base F1 | DeB3RTa-small F1 |
25
+ |------|----------|-----------------|------------------|
26
+ | Fake News Detection | FAKE.BR | 0.9906 | 0.9598 |
27
+ | Sentiment Analysis | CAROSIA | 0.9207 | 0.8722 |
28
+ | Regulatory Classification | BBRC | 0.7609 | 0.6712 |
29
+ | Hate Speech Detection | OFFCOMBR-3 | 0.7539 | 0.5460 |
30
 
31
+ ## Training Data
32
 
33
+ The models were trained on a diverse corpus of 1.05 billion tokens, including:
34
+ - Financial market relevant facts (2003-2023)
35
+ - Financial patents (2006-2021)
36
+ - Research articles from Brazilian Scielo
37
+ - Financial news articles (1999-2023)
38
+ - Wikipedia articles in Portuguese
39
 
40
+ ## Usage
 
 
 
 
 
 
 
 
 
 
41
 
42
+ ```python
43
+ from transformers import AutoModelForMaskedLM, AutoTokenizer
44
 
45
+ # Load model and tokenizer
46
+ model = AutoModelForMaskedLM.from_pretrained("higopires/DeB3RTa-[base/small]")
47
+ tokenizer = AutoTokenizer.from_pretrained("higopires/DeB3RTa-[base/small]")
48
 
49
+ # Example usage
50
+ text = "O mercado financeiro brasileiro apresentou [MASK] no último trimestre."
51
+ inputs = tokenizer(text, return_tensors="pt")
52
+ outputs = model(**inputs)
53
+ ```
54
 
55
+ ## Citations
56
 
57
+ If you use this model in your research, please cite:
58
+
59
+ ```bibtex
60
+ @article{pires2025deb3rta,
61
+ title={DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain},
62
+ author={Pires, Higo and Paucar, Leonardo and Carvalho, Joao Paulo},
63
+ journal={Big Data and Cognitive Computing},
64
+ year={2025},
65
+ volume={1},
66
+ number={0},
67
+ publisher={MDPI}
68
+ }
69
+ ```
70
+
71
+ ## Limitations
72
+
73
+ - Performance degradation on the smaller variant, particularly for hate speech detection
74
+ - May require task-specific fine-tuning for optimal performance
75
+ - Limited evaluation on multilingual financial tasks
76
+ - Model behavior on very long documents (>128 tokens) not extensively tested
77
+
78
+ ## License
79
+
80
+ MIT License
81
+
82
+ Copyright (c) 2025 Higo Pires
83
+
84
+ Permission is hereby granted, free of charge, to any person obtaining a copy
85
+ of this software and associated documentation files (the "Software"), to deal
86
+ in the Software without restriction, including without limitation the rights
87
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
88
+ copies of the Software, and to permit persons to whom the Software is
89
+ furnished to do so, subject to the following conditions:
90
+
91
+ The above copyright notice and this permission notice shall be included in all
92
+ copies or substantial portions of the Software.
93
+
94
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
95
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
96
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
97
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
98
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
99
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
100
+ SOFTWARE.
101
+
102
+ ## Acknowledgments
103
+
104
+ This work was supported by the Instituto Federal de Educação, Ciência e Tecnologia do Maranhão and the Human Language Technology Lab in Instituto de Engenharia de Sistemas e Computadores—Investigação e Desenvolvimento (INESC-ID).