Model Card

Model Details

  • Developed by: Amar-89
  • Model type: Quantized (8-bit)
  • License: MIT
  • Quantized from model: meta-llama/Llama-3.1-8B-Instruct
  • Model size: 9.1 GB

Uses the tokenizer from the base model. No additional tweaks to model besides quantization. Recommended: 12 GB VRAM

How to use

pip install -q -U torch bitsandbytes transformers accelerate 
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Amar-89/Llama-3.1-8B-Instruct-8bit"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def terminal_chat(model, tokenizer, system_prompt):
    """
    Starts a terminal-based chat session with a specified model, tokenizer, and system prompt.

    Args:
        model: The Hugging Face model object.
        tokenizer: The Hugging Face tokenizer object.
        system_prompt: The system role or instruction to define the chat behavior.
    """
    from transformers import pipeline

    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

    messages = [{"role": "system", "content": system_prompt}]
    print("Chat session started. Type 'exit' to quit.")

    while True:
        user_input = input("User: ")
        if user_input.lower() == "exit":
            print("Ending chat session. Goodbye!")
            break

        messages.append({"role": "user", "content": user_input})

        outputs = pipe(messages, max_new_tokens=256)

        response = outputs[0]["generated_text"][-1]['content']
        print(f"Assistant: {response}")

        print(messages)


system_prompt = "You are a pirate chatbot who always responds in pirate speak!"

terminal_chat(model, tokenizer, system_prompt)
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
F32
BF16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Amar-89/Llama-3.1-8B-Instruct-8bit

Quantized
(490)
this model