File size: 10,807 Bytes
eee3927 775047a 6428fae 775047a 4a97241 775047a e6f8dac 775047a 4a97241 e6f8dac 4a97241 e6f8dac 4a97241 e6f8dac 4a97241 e6f8dac 4a97241 775047a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
---
base_model:
- LiquidAI/LFM2-1.2B
library_name: transformers.js
license: other
license_name: lfm1.0
license_link: LICENSE
language:
- en
- ar
- zh
- fr
- de
- ja
- ko
- es
pipeline_tag: text-generation
tags:
- liquid
- edge
---
<center>
<div style="text-align: center;">
<img
src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png"
alt="Liquid AI"
style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
/>
</div>
<a href="https://playground.liquid.ai/chat">
<svg width="114.8" height="20" viewBox="0 0 1300 200" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Liquid Playground" style="margin-bottom: 1em;">
<title>Liquid: Playground</title>
<g>
<rect fill="#fff" width="600" height="200"></rect>
<rect fill="url(#x)" x="600" width="700" height="200"></rect>
</g>
<g transform="translate(20, 30) scale(0.4, 0.4)">
<path d="M172.314 129.313L172.219 129.367L206.125 188.18C210.671 195.154 213.324 203.457 213.324 212.382C213.324 220.834 210.956 228.739 206.839 235.479L275.924 213.178L167.853 33.6L141.827 76.9614L172.314 129.313Z" fill="black"/>
<path d="M114.217 302.4L168.492 257.003C168.447 257.003 168.397 257.003 168.352 257.003C143.515 257.003 123.385 237.027 123.385 212.387C123.385 203.487 126.023 195.204 130.55 188.24L162.621 132.503L135.966 86.7327L60.0762 213.183L114.127 302.4H114.217Z" fill="black"/>
<path d="M191.435 250.681C191.435 250.681 191.43 250.681 191.425 250.686L129.71 302.4H221.294L267.71 226.593L191.435 250.686V250.681Z" fill="black"/>
</g>
<g aria-hidden="true" fill="#fff" text-anchor="start" font-family="Verdana,DejaVu Sans,sans-serif" font-size="110">
<text x="200" y="148" textLength="329" fill="#000" opacity="0.1">Liquid</text>
<text x="190" y="138" textLength="329" fill="#000">Liquid</text>
<text x="655" y="148" textLength="619" fill="#000" opacity="0.1">Playground</text>
<text x="645" y="138" textLength="619">Playground</text>
</g>
<linearGradient id="x" x1="0%" y1="0%" x2="100%" y2="0%">
<stop offset="0%" style="stop-color:#000000"></stop>
<stop offset="100%" style="stop-color:#000000"></stop>
</linearGradient>
</svg>
</a>
</center>
# LFM2-1.2B
LFM2 is a new generation of hybrid models developed by [Liquid AI](https://www.liquid.ai/), specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.
We're releasing the weights of three post-trained checkpoints with 350M, 700M, and 1.2B parameters. They provide the following key features to create AI-powered edge applications:
* **Fast training & inference** β LFM2 achieves 3x faster training compared to its previous generation. It also benefits from 2x faster decode and prefill speed on CPU compared to Qwen3.
* **Best performance** β LFM2 outperforms similarly-sized models across multiple benchmark categories, including knowledge, mathematics, instruction following, and multilingual capabilities.
* **New architecture** β LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions.
* **Flexible deployment** β LFM2 runs efficiently on CPU, GPU, and NPU hardware for flexible deployment on smartphones, laptops, or vehicles.
Find more information about LFM2 in our [blog post](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models).
## π Model details
Due to their small size, **we recommend fine-tuning LFM2 models on narrow use cases** to maximize performance.
They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations.
However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills.
| Property | Value |
| ------------------- | ----------------------------- |
| **Parameters** | 1,170,340,608 |
| **Layers** | 16 (10 conv + 6 attn) |
| **Context length** | 32,768 tokens |
| **Vocabulary size** | 65,536 |
| **Precision** | bfloat16 |
| **Training budget** | 10 trillion tokens |
| **License** | LFM Open License v1.0 |
**Supported languages**: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
**Generation parameters**: We recommend the following parameters:
* `temperature=0.3`
* `min_p=0.15`
* `repetition_penalty=1.05`
**Architecture**: Hybrid model with multiplicative gates and short convolutions: 10 double-gated short-range LIV convolution blocks and 6 grouped query attention (GQA) blocks.
**Pre-training mixture**: Approximately 75% English, 20% multilingual, and 5% code data sourced from the web and licensed materials.
**Training approach**:
* Knowledge distillation using [LFM1-7B](https://www.liquid.ai/blog/introducing-lfm-7b-setting-new-standards-for-efficient-language-models) as teacher model
* Very large-scale SFT on 50% downstream tasks, 50% general domains
* Custom DPO with length normalization and semi-online datasets
* Iterative model merging
## π How to run LFM2
### Transformers.js
If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
```bash
npm i @huggingface/transformers
```
**Example**: Basic example
```js
import { pipeline, TextStreamer } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/LFM2-1.2B-ONNX",
{ dtype: "q4" },
);
// Define the list of messages
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
];
// Generate a response
const output = await generator(messages, {
max_new_tokens: 512,
do_sample: false,
streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text.at(-1).content);
// The capital of France is Paris.
```
**Example**: Tool calling
```js
import { AutoModelForCausalLM, AutoTokenizer, TextStreamer } from "@huggingface/transformers";
// Load tokenizer and model
const model_id = "onnx-community/LFM2-1.2B-ONNX";
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await AutoModelForCausalLM.from_pretrained(
model_id, { dtype: "q4", device: "webgpu" },
);
// Define tools and messages
const tools = [
{
name: "get_weather",
description: "Get current weather information for a location",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "The city and state, e.g. San Francisco, CA",
},
unit: {
type: "string",
enum: ["celsius", "fahrenheit"],
description: "The unit of temperature to use",
},
},
required: ["location"],
},
},
];
const messages = [
{
role: "user",
content: "What's the weather like in New York?"
},
];
// Prepare inputs
const input = tokenizer.apply_chat_template(messages, {
tools,
add_generation_prompt: true,
return_dict: true,
});
// Generate output
const sequences = await model.generate({
...input,
max_new_tokens: 512,
do_sample: false,
streamer: new TextStreamer(tokenizer, { skip_prompt: true, skip_special_tokens: false }),
});
// Decode and print the generated text
const response = tokenizer.batch_decode(
sequences.slice(null, [input.input_ids.dims[1], null]),
{ skip_special_tokens: true },
);
console.log(response[0]); // [get_weather(location="New York", unit="fahrenheit")]
```
### ONNXRuntime
```py
from transformers import AutoConfig, AutoTokenizer
import onnxruntime
import numpy as np
from huggingface_hub import hf_hub_download
# 1. Load config, processor, and model
model_id = "onnx-community/LFM2-1.2B-ONNX"
config = AutoConfig.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
filename = "model.onnx" # Options: "model.onnx", "model_fp16.onnx", "model_q4.onnx", "model_q4f16.onnx"
model_path = hf_hub_download(repo_id=model_id, filename=f"onnx/{filename}") # Download the graph
hf_hub_download(repo_id=model_id, filename=f"onnx/{filename}_data") # Download the weights
session = onnxruntime.InferenceSession(model_path)
## Set config values
num_key_value_heads = config.num_key_value_heads
head_dim = config.hidden_size // config.num_attention_heads
num_hidden_layers = config.num_hidden_layers
eos_token_id = config.eos_token_id
hidden_size = config.hidden_size
conv_L_cache = config.conv_L_cache
layer_types = config.layer_types
# 2. Prepare inputs
prompt = "What is C. elegans?"
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="np")
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
batch_size = input_ids.shape[0]
position_ids = np.tile(np.arange(0, input_ids.shape[-1]), (batch_size, 1))
past_cache_values = {}
for i in range(num_hidden_layers):
if layer_types[i] == 'full_attention':
for kv in ('key', 'value'):
past_cache_values[f'past_key_values.{i}.{kv}'] = np.zeros([batch_size, num_key_value_heads, 0, head_dim], dtype=np.float32)
elif layer_types[i] == 'conv':
past_cache_values[f'past_conv.{i}'] = np.zeros([batch_size, hidden_size, conv_L_cache], dtype=np.float32)
else:
raise ValueError(f"Unsupported layer type: {layer_types[i]}")
# 3. Generation loop
max_new_tokens = 1024
generated_tokens = np.array([[]], dtype=np.int64)
for i in range(max_new_tokens):
logits, *present_cache_values = session.run(None, dict(
input_ids=input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
**past_cache_values,
))
## Update values for next generation loop
input_ids = logits[:, -1].argmax(-1, keepdims=True)
attention_mask = np.concatenate([attention_mask, np.ones_like(input_ids, dtype=np.int64)], axis=-1)
position_ids = position_ids[:, -1:] + 1
for j, key in enumerate(past_cache_values):
past_cache_values[key] = present_cache_values[j]
generated_tokens = np.concatenate([generated_tokens, input_ids], axis=-1)
if (input_ids == eos_token_id).all():
break
## (Optional) Streaming
print(tokenizer.decode(input_ids[0]), end='', flush=True)
print()
# 4. Output result
print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])
``` |