higopires
/

DeB3RTa-base

@@ -1,53 +1,104 @@
----
-tags:
-- generated_from_trainer
-model-index:
-- name: DeB3RTa_3
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# DeB3RTa_3
-This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 192
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 1536
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.01
-- num_epochs: 50.0
-### Training results
-### Framework versions
-- Transformers 4.30.2
-- Pytorch 2.2.0+cu121
-- Datasets 2.13.1
-- Tokenizers 0.13.3

+# DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain
+DeB3RTa is a family of transformer-based language models specifically designed for Portuguese financial text processing. These models are built on the DeBERTa-v2 architecture and trained using a comprehensive mixed-domain pretraining strategy that combines financial, political, business management, and accounting corpora.
+## Model Variants
+Two variants are available:
+- **DeB3RTa-base**: 12 attention heads, 12 layers, intermediate size of 3072, hidden size of 768 (~426M parameters)
+- **DeB3RTa-small**: 6 attention heads, 12 layers, intermediate size of 1536, hidden size of 384 (~70M parameters)
+## Key Features
+- First Portuguese financial domain-specific transformer model
+- Mixed-domain pretraining incorporating finance, politics, business, and accounting texts
+- Enhanced performance on financial NLP tasks compared to general-domain models
+- Resource-efficient architecture with strong performance-to-parameter ratio
+- Advanced fine-tuning techniques including layer reinitialization, mixout regularization, and layer-wise learning rate decay
+## Performance
+The models have been evaluated on multiple financial domain tasks:
+| Task | Dataset | DeB3RTa-base F1 | DeB3RTa-small F1 |
+|------|----------|-----------------|------------------|
+| Fake News Detection | FAKE.BR | 0.9906 | 0.9598 |
+| Sentiment Analysis | CAROSIA | 0.9207 | 0.8722 |
+| Regulatory Classification | BBRC | 0.7609 | 0.6712 |
+| Hate Speech Detection | OFFCOMBR-3 | 0.7539 | 0.5460 |
+## Training Data
+The models were trained on a diverse corpus of 1.05 billion tokens, including:
+- Financial market relevant facts (2003-2023)
+- Financial patents (2006-2021)
+- Research articles from Brazilian Scielo
+- Financial news articles (1999-2023)
+- Wikipedia articles in Portuguese
+## Usage
+```python
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+# Load model and tokenizer
+model = AutoModelForMaskedLM.from_pretrained("higopires/DeB3RTa-[base/small]")
+tokenizer = AutoTokenizer.from_pretrained("higopires/DeB3RTa-[base/small]")
+# Example usage
+text = "O mercado financeiro brasileiro apresentou [MASK] no último trimestre."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+```
+## Citations
+If you use this model in your research, please cite:
+```bibtex
+@article{pires2025deb3rta,
+  title={DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain},
+  author={Pires, Higo and Paucar, Leonardo and Carvalho, Joao Paulo},
+  journal={Big Data and Cognitive Computing},
+  year={2025},
+  volume={1},
+  number={0},
+  publisher={MDPI}
+}
+```
+## Limitations
+- Performance degradation on the smaller variant, particularly for hate speech detection
+- May require task-specific fine-tuning for optimal performance
+- Limited evaluation on multilingual financial tasks
+- Model behavior on very long documents (>128 tokens) not extensively tested
+## License
+MIT License
+Copyright (c) 2025 Higo Pires
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+## Acknowledgments
+This work was supported by the Instituto Federal de Educação, Ciência e Tecnologia do Maranhão and the Human Language Technology Lab in Instituto de Engenharia de Sistemas e Computadores—Investigação e Desenvolvimento (INESC-ID).