Lukashenko_tarakan / README.md

update readme

7384bd3 verified 4 months ago

4.46 kB

	---
	license: apache-2.0
	datasets:
	- NebulasBellum/pizdziuk_luka
	language:
	- ru
	pipeline_tag: text-generation
	tags:
	- legal
	---

	# Lukashenko Generator

	There is the Text-to-Text Generative AI
	This model generate the phrases which are familar to that
	the Diktator and fascist from Belarus - Aliaksandr Lukashenko
	could speak.

	## Documentation

	- [Description](#description)
	- [Quick Start](#quick-start)
	- [Try in HF space](#hf-space)

	## Description

	The model used the dataset: [NebulasBellum/pizdziuk_luka](https://huggingface.co/datasets/NebulasBellum/pizdziuk_luka)
	the [parquet dataset version](https://huggingface.co/datasets/NebulasBellum/pizdziuk_luka/tree/refs%2Fconvert%2Fparquet)
	which was collected by the telegram chanel Pul Pervogo.
	Only with the help of this chanel we could make this great
	job like creating the fascist Diktator Lukashenko speech :)

	Model was trained with 340 epochs and produce a very good results.
	```commandline
	loss: 0.2963 - accuracy: 0.9159
	```

	The model all time in the improving by use the extended datasets which
	all time improving by adding the new speach of the Fascist Lukashenko.
	All information collected from public, and not only :) (thanks our partizanz).

	Right now the model folder [NebulasBellum/Lukashenko_tarakan](https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main)
	contain all neccesary files for download and use the model in the ```TensorFlow``` library
	with collected weights ```weights_lukash.h5```


	## Quick Start

	For use this model with the ```TensorFlow``` library you need:

	1. Download the model:

	```commandline
	md Luka_Pizdziuk
	cd Luka_Pizdziuk
	git clone https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main
	```

	and create the python script:

	```python
	import tensorflow as tf
	import copy
	import numpy as np


	# add the start generation of the lukashenko speech from the simple seed
	seed_text = 'я не глядя поддержу'
	weights_path = 'weights_lukash.h5'
	model_path = 'Lukashenko_tarakan'

	# Load the model to the Keras from saved files
	model = tf.keras.models.load_model(model_path)
	model.load_weights(weights_path)
	# Show the Model summary
	model.summary()

	# Load the dataset to the model
	with open('source_text_lukash.txt', 'r') as source_text_file:
	data = source_text_file.read().splitlines()

	tmp_data = copy.deepcopy(data)
	sent_length = 0
	for idx, line in enumerate(data):
	if len(line) < 5:
	tmp_data.pop(idx)
	else:
	sent_length += len(line.split())
	data = tmp_data
	lstm_length = int(sent_length / len(data))

	# Tokenize the dataset
	token = tf.keras.preprocessing.text.Tokenizer()
	token.fit_on_texts(data)
	encoded_text = token.texts_to_sequences(data)
	# Vocabular size
	vocab_size = len(token.word_counts) + 1

	# Create the sequences
	datalist = []
	for d in encoded_text:
	if len(d) > 1:
	for i in range(2, len(d)):
	datalist.append(d[:i])

	max_length = 20
	sequences = tf.keras.preprocessing.sequence.pad_sequences(datalist, maxlen=max_length, padding='pre')

	# X - input data, y - target data
	X = sequences[:, :-1]
	y = sequences[:, -1]

	y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)
	seq_length = X.shape[1]

	# Generate the Lukashenko speech from the seed
	generated_text = ''
	number_lines = 3
	for i in range(number_lines):
	text_word_list = []
	for _ in range(lstm_length * 2):
	encoded = token.texts_to_sequences([seed_text])
	encoded = tf.keras.preprocessing.sequence.pad_sequences(encoded, maxlen=seq_length, padding='pre')

	y_pred = np.argmax(model.predict(encoded), axis=-1)

	predicted_word = ""
	for word, index in token.word_index.items():
	if index == y_pred:
	predicted_word = word
	break

	seed_text = seed_text + ' ' + predicted_word
	text_word_list.append(predicted_word)

	seed_text = text_word_list[-1]
	generated_text = ' '.join(text_word_list)
	generated_text += '\n'

	print(f"Lukashenko are saying: {generated_text}")

	```

	## Try in HF space

	The ready to check space with working model are placed here:

	[Hugging Face Test Space](https://huggingface.co/spaces/NebulasBellum/gen_ai_simple_space)


	To contribute to the project you could help

	to TRX: TDqjSX6dB6eaFbpHRhX8CCZUSYDmMVvvmb

	to BNB: 0x107119102c2EC84099cDce3D5eFDE2dcbf4DEB2a

	to USDT TRC20: TDqjSX6dB6eaFbpHRhX8CCZUSYDmMVvvmb

	25% goes to the Help Ukraine to Win