BankAI-BERT / README.md

bilalzafar

Update README.md

54dffe8 verified 20 days ago

preview code

raw

history blame contribute delete

3.04 kB

metadata

license: mit
language:
  - en
metrics:
  - accuracy
  - f1
base_model:
  - google-bert/bert-base-uncased
pipeline_tag: text-classification
library_name: transformers
tags:
  - AI
  - Artificial-Intellegence
  - AI-Disclosure
  - Finance
  - BERT
  - Financial-NLP
  - Sentence-Classification
  - Transformers
  - Banking
  - Machine-Learning

BankAI-BERT

BankAI-BERT is a domain-specific BERT-based model fine-tuned for detecting AI-related disclosures in banking texts.

Intended Use

BankAI-BERT is designed to assist researchers, analysts, and regulators in identifying AI narratives in financial disclosures at the sentence level.

Performance

Accuracy: 99.37%
F1-score: 0.993
ROC AUC: 1.000
Brier Score: 0.0000

Training Data

BankAI-BERT was fine-tuned on a manually annotated dataset comprising sentences from U.S. bank annual reports spanning 2015 to 2023. The final training set included a balanced sample of 1,586 sentences—793 labeled as AI-related and 793 as non-AI. The model was initialized using the bert-base-uncased architecture.

Training

Setting	Value
Base model	`bert-base-uncased`
Epochs	3
Batch size	8 (train & eval)
Max seq length	128
Optimizer / LR scheduler	Hugging Face `Trainer` defaults (`AdamW`, lr 5e-5)
Hardware	Google Colab GPU (T4)

Evaluation & Robustness

Benchmarked against Logistic Regression, Naive Bayes, Random Forest, and XGBoost (TF-IDF features); BankAI-BERT scored highest on F1.
Calibration checked via Brier Score (0 = perfect).
SHAP analysis shows the model focuses on meaningful cues (e.g., machine learning, AI-powered)—not noise—ensuring interpretability and trust.
Robust to:
- Year-by-year slices (2015 → 2023 all F1 ≥ 0.99).
- Adversarial / edge-case sentences (100 % correct in manual test).
- Sentence-length bias (Pearson r ≈ 0.19, week correlation → no substential bias).

Files Included

config.json, tokenizer.json, vocab.txt, model.safetensors: Model files
tokenizer_config.json, special_tokens_map.json: Tokenizer configuration

GitHub Repository

For full pipeline, data, and visualizations, see the GitHub repository. .

Citation

Please cite my paper if you use this model:

Zafar, M. B. (2025). AI in Banking Disclosures: A BERT Classifier and Corpus-Level Thematic Mapping

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bilalzafar/BankAI-BERT")
model = AutoModelForSequenceClassification.from_pretrained("bilalzafar/BankAI-BERT")

## Inference Example
from transformers import pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
result = classifier("We are integrating AI into our credit risk management systems.")
print(result)
### Note: 1=AI and 0=Non-AI