Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,16 @@ tags:
|
|
7 |
- finance
|
8 |
- sentiment-analysis
|
9 |
- financial-sentiment-analysis
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
datasets:
|
11 |
- StephanAkkerman/stock-market-tweets-data
|
12 |
- StephanAkkerman/financial-tweets
|
13 |
-
- StephanAkkerman/
|
14 |
-
- StephanAkkerman/financial-tweets-stocks
|
15 |
-
- StephanAkkerman/financial-tweets-other
|
16 |
metrics:
|
17 |
- perplexity
|
18 |
widget:
|
@@ -45,36 +49,31 @@ model-index:
|
|
45 |
|
46 |
FinTwitBERT is a language model specifically pre-trained on a large dataset of financial tweets. This specialized BERT model aims to capture the unique jargon and communication style found in the financial Twitter sphere, making it an ideal tool for sentiment analysis, trend prediction, and other financial NLP tasks.
|
47 |
|
48 |
-
## Table of Contents
|
49 |
-
- [Dataset](#dataset)
|
50 |
-
- [Model Details](#model-details)
|
51 |
-
- [Installation](#installation)
|
52 |
-
- [Usage](#usage)
|
53 |
-
- [Training](#training)
|
54 |
-
- [Evaluation](#evaluation)
|
55 |
-
- [Contributing](#contributing)
|
56 |
-
- [License](#license)
|
57 |
-
|
58 |
## Dataset
|
59 |
-
FinTwitBERT is pre-trained on
|
|
|
|
|
|
|
60 |
|
61 |
-
## Model
|
62 |
-
|
63 |
|
64 |
-
|
|
|
65 |
|
66 |
-
## Installation
|
67 |
-
```bash
|
68 |
-
# Clone this repository
|
69 |
-
git clone https://github.com/TimKoornstra/FinTwitBERT
|
70 |
-
# Install required packages
|
71 |
-
pip install -r requirements.txt
|
72 |
-
```
|
73 |
## Usage
|
74 |
-
|
75 |
|
76 |
-
|
77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
## License
|
80 |
This project is licensed under the MIT License. See the [LICENSE](https://choosealicense.com/licenses/mit/) file for details.
|
|
|
7 |
- finance
|
8 |
- sentiment-analysis
|
9 |
- financial-sentiment-analysis
|
10 |
+
- twitter
|
11 |
+
- tweets
|
12 |
+
- stocks
|
13 |
+
- stock-market
|
14 |
+
- crypto
|
15 |
+
- cryptocurrency
|
16 |
datasets:
|
17 |
- StephanAkkerman/stock-market-tweets-data
|
18 |
- StephanAkkerman/financial-tweets
|
19 |
+
- StephanAkkerman/crypto-stock-tweets
|
|
|
|
|
20 |
metrics:
|
21 |
- perplexity
|
22 |
widget:
|
|
|
49 |
|
50 |
FinTwitBERT is a language model specifically pre-trained on a large dataset of financial tweets. This specialized BERT model aims to capture the unique jargon and communication style found in the financial Twitter sphere, making it an ideal tool for sentiment analysis, trend prediction, and other financial NLP tasks.
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
## Dataset
|
53 |
+
FinTwitBERT is pre-trained on several financial tweets datasets, consisting of tweets mentioning stocks and cryptocurrencies:
|
54 |
+
- [StephanAkkerman/crypto-stock-tweets](https://huggingface.co/datasets/StephanAkkerman/crypto-stock-tweets): 8,024,269 tweets
|
55 |
+
- [StephanAkkerman/stock-market-tweets-data](https://huggingface.co/datasets/StephanAkkerman/stock-market-tweets-data): 923,673 tweets
|
56 |
+
- [StephanAkkerman/financial-tweets](https://huggingface.co/datasets/StephanAkkerman/financial-tweets): 263,119 tweets
|
57 |
|
58 |
+
## Model Details
|
59 |
+
Based on the [FinBERT](https://huggingface.co/yiyanghkust/finbert-pretrain) model and tokenizer, FinTwitBERT includes additional masks (`@USER` and `[URL]`) to handle common elements in tweets. The model underwent 10 epochs of pre-training, with early stopping to prevent overfitting.
|
60 |
|
61 |
+
## More Information
|
62 |
+
For a comprehensive overview, including the complete training setup details and more, visit the [FinTwitBERT GitHub repository](https://github.com/TimKoornstra/FinTwitBERT).
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
## Usage
|
65 |
+
Using [HuggingFace's transformers library](https://huggingface.co/docs/transformers/index) the model and tokenizers can be converted into a pipeline for masked language modeling.
|
66 |
|
67 |
+
```python
|
68 |
+
from transformers import pipeline
|
69 |
+
|
70 |
+
pipe = pipeline(
|
71 |
+
"fill-mask",
|
72 |
+
model="StephanAkkerman/FinTwitBERT",
|
73 |
+
tokenizer="StephanAkkerman/FinTwitBERT",
|
74 |
+
)
|
75 |
+
print(pipe("Bitcoin is a [MASK] coin."))
|
76 |
+
```
|
77 |
|
78 |
## License
|
79 |
This project is licensed under the MIT License. See the [LICENSE](https://choosealicense.com/licenses/mit/) file for details.
|