README_EN.md · gai-labs/strela at refs/pr/1

Strela is a powerful language model designed to provide high speed and quality responses on weak devices. Strela is recommended for the following purposes:

Chat bot for dialogue
Story writer
Song writer
Translation of Russian and English languages
When it is ineffective to use heavier models

Description from strela itself

I am a computer program designed to process and analyze natural language. I have the ability to understand, analyze, and process natural language, allowing me to communicate with people through various communication channels. My main goal is to help people solve tasks and provide information based on requests. I can be used for various purposes: from automatic text generation, translation from one language to another, or even creating your own verses and songs.

Using the model online

You can try it out here.

Using the model for in-app chat

Recommended is GTP4ALL, it supports GGUF, so you need to download the special model in GGUF format.

Using the model for Unity chat

Recommended is LLM for Unity, it supports GGUF, so you need to download the special model in GGUF format.

Using the quantized model for chat in Python | Recommended

You should install gpt4all

pip install gpt4all

Then, download the GGUF version of the model and move the file to your script's directory

# Library Imports
import os
from gpt4all import GPT4All

# Initializing the model from the strela-q4_k_m.gguf file in the current directory
model = GPT4All(model_name='strela-q4_k_m.gguf', model_path=os.getcwd())

# Callback function to stop generation if Arrow generates the '#' symbol, which marks the beginning of roles declaration
def stop_on_token_callback(token_id, token_string):
    if '#' in token_string:
        return False
    else:
        return True

# System prompt
system_template = """### System:
You are an AI assistant who gives a helpful response to whatever humans ask of you.
"""

# Human and AI prompt
prompt_template = """
### Human:
{0}
### Assistant:
"""

# Chat session
with model.chat_session(system_template, prompt_template):
    print("To exit, enter 'Exit'")
    while True:
        print('')
        user_input = input(">>> ")
        if user_input.lower() != "exit":

            # Streaming generation
            for token in model.generate(user_input, streaming=True, callback=stop_on_token_callback):
                print(token, end='')
        else:
            break

To exit, enter 'Exit'

>>> Hello
Hello! How can I help you today?
>>>

Using the full model for chat in Python

# Library Imports
from transformers import AutoTokenizer, AutoModelForCausalLM

# Loading the model
tokenizer = AutoTokenizer.from_pretrained("gai-labs/strela")
model = AutoModelForCausalLM.from_pretrained("gai-labs/strela")

# System prompt
system_prompt = "You are an AI assistant who gives a helpful response to whatever humans ask of you."

# Your prompt
prompt = "Hello!"

# Chat template
chat = f"""### System:
{system_prompt}
### Human:
{prompt}
### Assistant:
"""

# Generation
model_inputs = tokenizer([chat], return_tensors="pt")
generated_ids = model.generate(**model_inputs, max_new_tokens=64) # Adjust the maximum token count for generation
output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Cleaning the generated output from the chat template
output = output.replace(chat, "")

# Output of the generation results
print(output)

Hello! How can I help?

Using the model for text generation in Python

# Library Imports
from transformers import AutoTokenizer, AutoModelForCausalLM

# Loading the model
tokenizer = AutoTokenizer.from_pretrained("gai-labs/strela")
model = AutoModelForCausalLM.from_pretrained("gai-labs/strela")

# Prompt
prompt = "AI - "

# Generation
model_inputs = tokenizer([prompt], return_tensors="pt")
generated_ids = model.generate(**model_inputs, max_new_tokens=64) # Adjust the maximum token count for generation
output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Output of the generation results
print(output)

AI - is a field of computer science and technology that deals with creating machines capable of "understanding" humans or performing tasks with logic similar to that of humans.