Falcon3-7B-Instruct OpenVINO INT4

This repository contains the tiiuae/Falcon3-7B-Instruct model optimized for inference with Intel's OpenVINO runtime. The model has been quantized to INT4 using the AWQ quantization scheme for improved performance while maintaining quality.

Model Details

  • Original Model: tiiuae/Falcon3-7B-Instruct
  • Model Type: Instruction-tuned Large Language Model
  • Parameters: 7B
  • Quantization: INT4 Symmetric AWQ (Activation-aware Weight Quantization)
  • Group Size: -1 (per-channel quantization)

Optimization Details

This model was converted from the original Hugging Face model to OpenVINO format using the Optimum Intel library. The following optimization command was used:

optimum-cli export openvino \
  -m tiiuae/Falcon3-7B-Instruct \
  --weight-format int4 \
  --sym \
  --dataset auto \
  --awq \
  --group-size -1 \
  falcon3-7b-instruct-int4-sym-ov

Usage

Prerequisites

  • OpenVINO 2024.0 or newer
  • optimum-intel
  • transformers

Sample Inference code with Optimum Intel

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

# Load tokenizer and model
model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = OVModelForCausalLM.from_pretrained(model_id)

# Generate text
prompt = "Write a short story about a robot learning to paint:"
input_ids = tokenizer(prompt, return_tensors="pt")
output = model.generate(
    **input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Sample Inference code with Optimum Intel

  1. Install packages required for using OpenVINO GenAI.
pip install openvino-genai huggingface_hub
  1. Download model and run inference.
import huggingface_hub as hf_hub

model_id = "rpanchum/falcon3-7b-instruct-int4-sym-ov"
model_path = "falcon3-7b-instruct-int4-sym-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

import openvino_genai as ov_genai

device = "CPU"
pipe = ov_genai.LLMPipeline(model_path, device)
print(pipe.generate("What is OpenVINO?", max_length=200))

License

This model inherits the license of the original tiiuae/Falcon3-7B-Instruct model.

Downloads last month
5
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for openvino library.

Model tree for rpanchum/falcon3-7b-instruct-int4-sym-ov

Finetuned
(24)
this model