File size: 5,314 Bytes

---
license: apache-2.0
language:
- el
metrics:
- f1
- recall
- precision
- hamming_loss
pipeline_tag: text-classification
widget:
- text: >-
    Δεν ξέρω αν είμαι ο μόνος αλλά πιστεύω πως όσο είμαστε απασχολημένοι με την όλη κατάσταση της αστυνομίας η κυβέρνηση προσπαθεί να καλύψει αλλά γεγονότα της επικαιρότητας όπως πανδημία και εξωτερική πολιτική.
  example_title: Πολιτική
- text: >-
    Άλλες οικονομίες, όπως η Κίνα, προσπαθούν να διατηρούν την αξία του νομίσματος τους χαμηλά ώστε να καταστήσουν τις εξαγωγές τους πιο ελκυστικές στο εξωτερικό. Γιατί όμως θεωρούμε πως η πτωτική πορεία της Τουρκικής λίρας είναι η "αχίλλειος πτέρνα" της Τουρκίας;
  example_title: Οικονομία
- text: >-
    Γνωρίζει κανείς γιατί δεν ψηφίζουμε πια για να βγει ποιο τραγούδι θα εκπροσωπήσει την Ελλάδα; Τα τελευταία χρόνια ο κόσμος είναι δυσαρεστημένος με τα τραγούδια που στέλνουν, γιατί συνεχίζεται αυτό;
  example_title: Ψυχαγωγία/Κουλτούρα
model-index:
- name: IMISLab/Greek-Reddit-BERT
  results:
  - task:
      type: text-classification
      name: Text-classification
    dataset:
      name: GreekReddit
      type: greekreddit
      config: default
      split: test
    metrics:
    - name: Precision
      type: precision
      value: 80.05
      verified: true
    - name: Recall
      type: recall
      value: 81.48
      verified: true
    - name: F1
      type: f1
      value: 80.61
      verified: true
    - name: Hamming Loss
      type: hamming_loss
      value: 19.84
      verified: true
datasets:
- IMISLab/GreekReddit
library_name: transformers
tags:
- Social Media
- Reddit
- Topic Classification
- Text Classification
- Greek NLP
---

# Greek-Reddit-BERT

A Greek topic classification model based on [GREEK-BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1).  
This model is fine-tuned on [GreekReddit](https://huggingface.co/datasets/IMISLab/GreekReddit) as part of our research article:  
[Mastrokostas, C., Giarelis, N., & Karacapilidis, N. (2024) Social Media Topic Classification on Greek Reddit](https://www.mdpi.com/2078-2489/15/9/521)  
For more information see the evaluation section below.

<img src="Greek Reddit finetuning.svg" width="600"/>

## Training dataset

The training dataset of `Greek-Reddit-BERT` is [GreekReddit](https://huggingface.co/datasets/IMISLab/GreekReddit), which is a topic classification dataset.  
Overall, [GreekReddit](https://huggingface.co/datasets/IMISLab/GreekReddit) contains 6,534 user posts collected from Greek subreddits belonging to various topics (i.e., society, politics, economy, entertainment/culture, sports).  

## Training configuration

We fine-tuned `nlpaueb/bert-base-greek-uncased-v1` (113 million parameters) on the GreekReddit train split using the following parameters:
* GPU batch size = 16
* Total training epochs = 4
* Learning rate = 5e−5
* Dropout Rate = 0.1
* Number of labels = 10
* 32-bit floating precision
* Tokenization  
  * maximum input token length = 512
  * padding = True
  * truncation = True

## Evaluation
**Model**|**Precision**|**Recall**|**F1**|**Hamming Loss**
------------|-----------|-----------|-----------|-------------
Greek-Reddit-BERT|80.05|81.48|80.61|19.84


### Example code
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model_name = 'IMISLab/Greek-Reddit-BERT'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name) 

topic_classifier = pipeline(
    'text-classification',
    device = 'cpu',
    model = model,
    tokenizer = tokenizer,
    truncation = True,
    max_length = 512
)
    
text = 'Άλλες οικονομίες, όπως η Κίνα, προσπαθούν να διατηρούν την αξία του νομίσματος τους χαμηλά ώστε να καταστήσουν τις εξαγωγές τους πιο ελκυστικές στο εξωτερικό. Γιατί όμως θεωρούμε πως η πτωτική πορεία της Τουρκικής λίρας είναι η ""αχίλλειος πτέρνα"" της Τουρκίας;'
output = topic_classifier(text)
print(output[0]['label'])
```
## Contact

If you have any questions/feedback about the model please e-mail one of the following authors:
```
[email protected]
[email protected]
[email protected]
```
## Citation

```
@article{mastrokostas2024social,
  title={Social Media Topic Classification on Greek Reddit},
  author={Mastrokostas, Charalampos and Giarelis, Nikolaos and Karacapilidis, Nikos},
  journal={Information},
  volume={15},
  number={9},
  pages={521},
  year={2024},
  publisher={Multidisciplinary Digital Publishing Institute}
}
```