Falcon3-7B-Instruct OpenVINO INT4

This repository contains the tiiuae/Falcon3-7B-Instruct model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality.

Model Details

Original Model: tiiuae/Falcon3-7B-Instruct
Model Type: Instruction-tuned Large Language Model
Parameters: 7B
Quantization: INT4 Symmetric AWQ (Activation-aware Weight Quantization)
Group Size: -1 (per-channel quantization)

Optimization Details

This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used:

optimum-cli export openvino \
  -m tiiuae/Falcon3-7B-Instruct \
  --weight-format int4 \
  --sym \
  --dataset auto \
  --awq \
  --group-size -1 \
  falcon3-7b-instruct-int4-sym-ov

Usage

Prerequisites

OpenVINO 2024.0 or newer
optimum-intel
transformers

Sample Inference code with Optimum Intel

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

# Load tokenizer and model
model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)

# Generate text
prompt = "Write a short story about a robot learning to paint:"
input_ids = tokenizer(prompt, return_tensors="pt")
output = model.generate(
    **input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Sample Inference code with Optimum Intel

Install packages required for using OpenVINO GenAI.

pip install openvino-genai huggingface_hub

Download model and run inference.

import huggingface_hub as hf_hub

model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov"
model_path = "falcon3-7b-instruct-int4-sym-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

import openvino_genai as ov_genai

device = "CPU"
pipe = ov_genai.LLMPipeline(model_path, device)
print(pipe.generate("What is OpenVINO?", max_length=200))

License

This model inherits the license of the original tiiuae/Falcon3-7B-Instruct model.

rpanchum
/

falcon3-7b-instruct-int4-sym-ov

Falcon3-7B-Instruct OpenVINO INT4

Model Details

Optimization Details

Usage

Prerequisites

Sample Inference code with Optimum Intel

Sample Inference code with Optimum Intel

License

Model tree for rpanchum/falcon3-7b-instruct-int4-sym-ov