|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- NebulasBellum/pizdziuk_luka |
|
language: |
|
- ru |
|
pipeline_tag: text-generation |
|
tags: |
|
- legal |
|
--- |
|
|
|
# Lukashenko Generator |
|
|
|
There is the Text-to-Text Generative AI |
|
This model generate the phrases which are familar to that |
|
the Diktator and fascist from Belarus - **Aliaksandr Lukashenko** |
|
could speak. |
|
|
|
## Documentation |
|
|
|
- [Description](#description) |
|
- [Quick Start](#quick-start) |
|
- [Try in HF space](#hf-space) |
|
|
|
## Description |
|
|
|
The model used the dataset: [NebulasBellum/pizdziuk_luka](https://huggingface.co/datasets/NebulasBellum/pizdziuk_luka) |
|
the [parquet dataset version](https://huggingface.co/datasets/NebulasBellum/pizdziuk_luka/tree/refs%2Fconvert%2Fparquet) |
|
which was collected by the telegram chanel **Pul Pervogo**. |
|
Only with the help of this chanel we could make this great |
|
job like creating the **fascist Diktator Lukashenko** speech :) |
|
|
|
Model was trained with **340** epochs and produce a very good results. |
|
```commandline |
|
loss: 0.2963 - accuracy: 0.9159 |
|
``` |
|
|
|
The model all time in the improving by use the extended datasets which |
|
all time improving by adding the new speach of the **Fascist Lukashenko**. |
|
All information collected from public, and not only :) (thanks our partizanz). |
|
|
|
Right now the model folder [NebulasBellum/Lukashenko_tarakan](https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main) |
|
contain all neccesary files for download and use the model in the ```TensorFlow``` library |
|
with collected weights ```weights_lukash.h5``` |
|
|
|
|
|
## Quick Start |
|
|
|
For use this model with the ```TensorFlow``` library you need: |
|
|
|
1. Download the model: |
|
|
|
```commandline |
|
md Luka_Pizdziuk |
|
cd Luka_Pizdziuk |
|
git clone https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main |
|
``` |
|
|
|
and create the python script: |
|
|
|
```python |
|
import tensorflow as tf |
|
import copy |
|
import numpy as np |
|
|
|
|
|
# add the start generation of the lukashenko speech from the simple seed |
|
seed_text = 'я не глядя поддержу' |
|
weights_path = 'weights_lukash.h5' |
|
model_path = 'Lukashenko_tarakan' |
|
|
|
# Load the model to the Keras from saved files |
|
model = tf.keras.models.load_model(model_path) |
|
model.load_weights(weights_path) |
|
# Show the Model summary |
|
model.summary() |
|
|
|
# Load the dataset to the model |
|
with open('source_text_lukash.txt', 'r') as source_text_file: |
|
data = source_text_file.read().splitlines() |
|
|
|
tmp_data = copy.deepcopy(data) |
|
sent_length = 0 |
|
for idx, line in enumerate(data): |
|
if len(line) < 5: |
|
tmp_data.pop(idx) |
|
else: |
|
sent_length += len(line.split()) |
|
data = tmp_data |
|
lstm_length = int(sent_length / len(data)) |
|
|
|
# Tokenize the dataset |
|
token = tf.keras.preprocessing.text.Tokenizer() |
|
token.fit_on_texts(data) |
|
encoded_text = token.texts_to_sequences(data) |
|
# Vocabular size |
|
vocab_size = len(token.word_counts) + 1 |
|
|
|
# Create the sequences |
|
datalist = [] |
|
for d in encoded_text: |
|
if len(d) > 1: |
|
for i in range(2, len(d)): |
|
datalist.append(d[:i]) |
|
|
|
max_length = 20 |
|
sequences = tf.keras.preprocessing.sequence.pad_sequences(datalist, maxlen=max_length, padding='pre') |
|
|
|
# X - input data, y - target data |
|
X = sequences[:, :-1] |
|
y = sequences[:, -1] |
|
|
|
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size) |
|
seq_length = X.shape[1] |
|
|
|
# Generate the Lukashenko speech from the seed |
|
generated_text = '' |
|
number_lines = 3 |
|
for i in range(number_lines): |
|
text_word_list = [] |
|
for _ in range(lstm_length * 2): |
|
encoded = token.texts_to_sequences([seed_text]) |
|
encoded = tf.keras.preprocessing.sequence.pad_sequences(encoded, maxlen=seq_length, padding='pre') |
|
|
|
y_pred = np.argmax(model.predict(encoded), axis=-1) |
|
|
|
predicted_word = "" |
|
for word, index in token.word_index.items(): |
|
if index == y_pred: |
|
predicted_word = word |
|
break |
|
|
|
seed_text = seed_text + ' ' + predicted_word |
|
text_word_list.append(predicted_word) |
|
|
|
seed_text = text_word_list[-1] |
|
generated_text = ' '.join(text_word_list) |
|
generated_text += '\n' |
|
|
|
print(f"Lukashenko are saying: {generated_text}") |
|
|
|
``` |
|
|
|
## Try in HF space |
|
|
|
The ready to check space with working model are placed here: |
|
|
|
[Hugging Face Test Space](https://huggingface.co/spaces/NebulasBellum/gen_ai_simple_space) |
|
|
|
|
|
**To contribute to the project you could help** |
|
|
|
to TRX: TDqjSX6dB6eaFbpHRhX8CCZUSYDmMVvvmb |
|
|
|
to BNB: 0x107119102c2EC84099cDce3D5eFDE2dcbf4DEB2a |
|
|
|
to USDT TRC20: TDqjSX6dB6eaFbpHRhX8CCZUSYDmMVvvmb |
|
|
|
25% goes to the Help Ukraine to Win |
|
|
|
|