Qwen2.5-7B-Instruct-kowiki-qa-4bit mlx convert model

Requirement

  • pip install mlx-lm

Usage

  • Generate with CLI

    mlx_lm.generate --model mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit --prompt "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
  • In Python

    from mlx_lm import load, generate
    
    model, tokenizer = load(
        "mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit",
        tokenizer_config={"trust_remote_code": True},
    )
    
    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ² ν•œ μ±—λ΄‡μž…λ‹ˆλ‹€."},
        {"role": "user", "content": prompt},
    ]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    
    text = generate(
        model,
        tokenizer,
        prompt=prompt,
        # verbose=True,
        # max_tokens=8196,
        # temp=0.0,
    )
    
  • OpenAI Compitable HTTP Server

    mlx_lm.server --model mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit --host 0.0.0.0
    
    import openai
    
    
    client = openai.OpenAI(
        base_url="http://localhost:8080/v1",
    )
    
    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ ˆν•œ μ±—λ΄‡μž…λ‹ˆλ‹€.",},
        {"role": "user", "content": prompt},
    ]
    res = client.chat.completions.create(
        model='mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit',
        messages=messages,
        temperature=0.2,
    )
    
    print(res.choices[0].message.content)
    
Downloads last month
8
Safetensors
Model size
1.19B params
Tensor type
FP16
Β·
U32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for mlx library.