YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Good and Small models for Mobile Devices

Try them out in

PrivacyAIIcon Privacy AI on the App Store.

Privacy AI is a lightweight, serverless application. All tools - including web search, stock quotes, and Health analysis - run on-device, keeping data and actions fully private. It supports both local AI models and connections to your own OpenAI-compatible servers.

Refer more information on Privacy AI Official Site:

Qwen3 4B Instruct 2507

Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.

Model Intention: Latest Qwen3-4B Instruct model with enhanced reasoning, logical thinking, mathematics, science, coding, and tool usage capabilities

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-4B-Instruct-2507-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507

Model License: License Info

Model Description: Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.

Developer: https://huggingface.co/Qwen

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 4B Thinking 2507

Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.

Model Intention: Advanced reasoning model with thinking mode enabled for complex logical reasoning, mathematics, science, and coding tasks

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-4B-Thinking-2507-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507

Model License: License Info

Model Description: Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.

Developer: https://huggingface.co/Qwen

File Size: 2100 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


GLM Edge 4B Chat

GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.

Model Intention: It is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf

Model Info URL: https://huggingface.co/THUDM

Model License: License Info

Model Description: GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.

Developer: https://huggingface.co/THUDM

File Size: 2627 MB

Context Length: 1024 tokens

Prompt Format:

{% for item in messages %}{% if item['role'] == 'system' %}<|system|>
{{ item['content'] }}{% elif item['role'] == 'user' %}<|user|>
{{ item['content'] }}{% elif item['role'] == 'assistant' %}<|assistant|>
{{ item['content'] }}{% endif %}{% endfor %}{% if add_generation_prompt %}<|assistant|>
{% endif %}

Template Name: glm

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Gemma 3n E2B it

Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.

Model Intention: Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf

Model Info URL: https://huggingface.co/google/gemma-3n-E2B-it

Model License: License Info

Model Description: Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.

Developer: https://huggingface.co/google

Update Date: 2025-06-27

File Size: 2720 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


SmolLM3 3B

SmolLM3 is a fully open model that offers strong performance at the 3Bโ€“4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.

Model Intention: SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages (English, French, Spanish, German, Italian, and Portuguese), advanced reasoning and long context.

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf

Model Info URL: https://huggingface.co/HuggingFaceTB/SmolLM3-3B

Model License: License Info

Model Description: SmolLM3 is a fully open model that offers strong performance at the 3Bโ€“4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.

Developer: https://huggingface.co/HuggingFaceTB

File Size: 1920 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Phi4 mini 4B

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.

Model Intention: Phi-4-mini-instruct is a lightweight model focused on high-quality, reasoning dense data. It supports 128K token context length

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf

Model Info URL: https://huggingface.co/microsoft/Phi-4-mini-instruct

Model License: License Info

Model Description: Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.

Developer: https://huggingface.co/microsoft

File Size: 2020 MB

Context Length: 2048 tokens

Prompt Format:

{% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}

Template Name: llama3.2

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 1.7B

Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.

Model Intention: The 1.7B model in the Qwen3 series is a small model designed for fast predictions and function calls.

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-1.7B

Model License: License Info

Model Description: Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.

Developer: https://huggingface.co/Qwen

File Size: 1110 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


ERNIE-4.5 0.3B

ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training

Model Intention: ERNIE-4.5-0.3B-Base is a text dense Base model for testing the model's architecture.

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf

Model Info URL: https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT

Model License: License Info

Model Description: ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training

Developer: https://huggingface.co/baidu

File Size: 233 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


LFM2.5 1.2B Instruct

LFM2.5-1.2B-Instruct is Liquid AI's latest 1.2B parameter hybrid model with extended pre-training (28T tokens) and reinforcement learning, designed for on-device deployment. It rivals much larger models with fast edge inference (239 tok/s on AMD CPU, 82 tok/s on mobile NPU) while running under 1GB memory. Features include function calling with custom tool tags, 32K context window, and multilingual support across 8 languages. The model excels at agentic tasks, data extraction, RAG workflows, and multi-turn conversations, making it ideal for mobile and edge AI applications.

Model Intention: Best-in-class 1.2B hybrid model optimized for agentic tasks, data extraction, RAG, and fast edge inference with tool calling support

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2.5-1.2B-Instruct-Q4_0.gguf

Model Info URL: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct

Model License: License Info

Model Description: LFM2.5-1.2B-Instruct is Liquid AI's latest 1.2B parameter hybrid model with extended pre-training (28T tokens) and reinforcement learning, designed for on-device deployment. It rivals much larger models with fast edge inference (239 tok/s on AMD CPU, 82 tok/s on mobile NPU) while running under 1GB memory. Features include function calling with custom tool tags, 32K context window, and multilingual support across 8 languages. The model excels at agentic tasks, data extraction, RAG workflows, and multi-turn conversations, making it ideal for mobile and edge AI applications.

Developer: https://huggingface.co/LiquidAI

File Size: 700 MB

Context Length: 8000 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Jan v1 4B

Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.

Model Intention: Advanced agentic language model optimized for reasoning and problem-solving with 91.1% accuracy on question answering

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf

Model Info URL: https://huggingface.co/janhq/Jan-v1-4B

Model License: License Info

Model Description: Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.

Developer: https://huggingface.co/janhq

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Menlo Lucy 1.7B

Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.

Model Intention: Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing.

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Menlo_Lucy-Q4_K_M.gguf

Model Info URL: https://huggingface.co/Menlo/Lucy

Model License: License Info

Model Description: Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.

Developer: https://huggingface.co/Menlo

File Size: 1056 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Nemotron 1.5B

OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.

Model Intention: It is a reasoning model that is post-trained for reasoning about math, code and science solution generation.

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/OpenReasoning-Nemotron-1.5B-Q4_K_M.gguf

Model Info URL: https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B

Model License: License Info

Model Description: OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.

Developer: https://huggingface.co/nvidia

File Size: 940 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 1.7B Uncensored

Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.

Model Intention: An uncensored 1.7B model optimized for creative writing, fiction stories, horror narratives, and unrestricted conversational scenarios.

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Uncensored.gguf

Model Info URL: https://huggingface.co/DavidAU/Qwen3-1.7B-HORROR-Imatrix-Max-GGUF

Model License: License Info

Model Description: Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.

Developer: https://huggingface.co/DavidAU

File Size: 1110 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Gemma 3 270M

Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.

Model Intention: Ultra-compact 270M parameter model optimized for resource-constrained environments with 32K context length

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3-270m-q4_0.gguf

Model Info URL: https://huggingface.co/google/gemma-3-270m

Model License: License Info

Model Description: Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.

Developer: https://huggingface.co/google

File Size: 245 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: gemma

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Youtu-LLM 2B

Youtu-LLM-2B is Tencent's compact yet powerful 2B parameter model designed for agentic applications with native Chain of Thought (CoT) reasoning. Despite its small size, it delivers impressive performance on complex tasks including coding (95.9% on HumanEval), mathematics (93.7% on MATH-500), and agent tasks. The model features 128K context length, supports multiple languages, and excels at tool use, deep research, and code generation. Its reasoning mode enables step-by-step problem solving for complex queries while maintaining high efficiency for on-device deployment.

Model Intention: Compact 2B agentic model with native reasoning capabilities, optimized for agent tasks, tool use, and complex problem-solving

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Youtu-LLM-2B.i1-Q4_0.gguf

Model Info URL: https://huggingface.co/tencent/Youtu-LLM-2B

Model License: License Info

Model Description: Youtu-LLM-2B is Tencent's compact yet powerful 2B parameter model designed for agentic applications with native Chain of Thought (CoT) reasoning. Despite its small size, it delivers impressive performance on complex tasks including coding (95.9% on HumanEval), mathematics (93.7% on MATH-500), and agent tasks. The model features 128K context length, supports multiple languages, and excels at tool use, deep research, and code generation. Its reasoning mode enables step-by-step problem solving for complex queries while maintaining high efficiency for on-device deployment.

Developer: https://huggingface.co/tencent

File Size: 1200 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


LFM2.5-VL 1.6B

LFM2.5-VL-1.6B is Liquid AI's refreshed vision-language model built on the LFM2.5-1.2B-Base backbone with SigLIP2 NaFlex vision encoder (400M parameters). It features enhanced instruction following, improved multilingual vision understanding across 8 languages, robust visual content processing with multi-image support, high-resolution handling, and OCR capabilities. The model processes images up to 512ร—512 pixels with aspect ratio preservation and tiling strategy for larger images. With 32K context window and 1.6B parameters (2B total with vision encoder), it excels at general vision-language workloads, document comprehension, and multi-image reasoning, making it ideal for edge AI applications.

Model Intention: Enhanced multimodal vision-language model with improved instruction following, multilingual vision understanding, and robust visual content processing including OCR

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2.5-VL-1.6B-Q4_0.gguf

Model Info URL: https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B

Model License: License Info

Model Description: LFM2.5-VL-1.6B is Liquid AI's refreshed vision-language model built on the LFM2.5-1.2B-Base backbone with SigLIP2 NaFlex vision encoder (400M parameters). It features enhanced instruction following, improved multilingual vision understanding across 8 languages, robust visual content processing with multi-image support, high-resolution handling, and OCR capabilities. The model processes images up to 512ร—512 pixels with aspect ratio preservation and tiling strategy for larger images. With 32K context window and 1.6B parameters (2B total with vision encoder), it excels at general vision-language workloads, document comprehension, and multi-image reasoning, making it ideal for edge AI applications.

Developer: https://huggingface.co/LiquidAI

File Size: 950 MB

Context Length: 8000 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3-VL 4B Instruct

Qwen3-VL-4B-Instruct is a multimodal vision-language model with 4B parameters, featuring enhanced capabilities in instruction following, coding, mathematics, and multilingual understanding. It supports both image and text processing with strong reasoning capabilities, making it ideal for applications requiring visual understanding and text generation.

Model Intention: Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Instruct-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct

Model License: License Info

Model Description: Qwen3-VL-4B-Instruct is a multimodal vision-language model with 4B parameters, featuring enhanced capabilities in instruction following, coding, mathematics, and multilingual understanding. It supports both image and text processing with strong reasoning capabilities, making it ideal for applications requiring visual understanding and text generation.

Developer: https://huggingface.co/Qwen

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3-VL 4B Thinking

Qwen3-VL-4B-Thinking is a specialized multimodal vision-language model with enhanced reasoning capabilities and thinking mode. It excels at complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and intricate logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, making it ideal for applications requiring deep analytical capabilities and visual understanding.

Model Intention: Advanced multimodal reasoning model with thinking mode for complex visual reasoning, mathematics, and scientific tasks

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Thinking-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking

Model License: License Info

Model Description: Qwen3-VL-4B-Thinking is a specialized multimodal vision-language model with enhanced reasoning capabilities and thinking mode. It excels at complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and intricate logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, making it ideal for applications requiring deep analytical capabilities and visual understanding.

Developer: https://huggingface.co/Qwen

File Size: 2100 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3-VL 2B Instruct

Qwen3-VL-2B-Instruct is a compact multimodal vision-language model with 2B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.

Model Intention: Compact multimodal vision-language model with enhanced instruction following, optimized for efficient deployment

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-2B-Instruct-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct

Model License: License Info

Model Description: Qwen3-VL-2B-Instruct is a compact multimodal vision-language model with 2B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.

Developer: https://huggingface.co/Qwen

File Size: 1300 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Youtu-VL 4B Instruct

Youtu-VL-4B-Instruct is a lightweight Vision-Language Model with 4B parameters, built on Youtu-LLM with a focus on multimodal understanding. It supports vision-centric tasks (visual grounding, object detection, segmentation, depth estimation, pose estimation) and general multimodal tasks (VQA, reasoning, mathematics, OCR, multi-image understanding). Features unified architecture with Vision-Language Unified Autoregressive Supervision (VLUAS) that handles both dense vision prediction and text-based prediction without task-specific modules.

Model Intention: Lightweight multimodal vision-language model with unified architecture for vision-centric and vision-language tasks

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Youtu-VL-4B-Instruct.Q4_K_S.gguf

Model Info URL: https://huggingface.co/tencent/Youtu-VL-4B-Instruct

Model License: License Info

Model Description: Youtu-VL-4B-Instruct is a lightweight Vision-Language Model with 4B parameters, built on Youtu-LLM with a focus on multimodal understanding. It supports vision-centric tasks (visual grounding, object detection, segmentation, depth estimation, pose estimation) and general multimodal tasks (VQA, reasoning, mathematics, OCR, multi-image understanding). Features unified architecture with Vision-Language Unified Autoregressive Supervision (VLUAS) that handles both dense vision prediction and text-based prediction without task-specific modules.

Developer: https://youtu-tip.com/#llm

Update Date: 2025-01-30

Update History: 2025-01-30: Initial release with VLUAS architecture

File Size: 2500 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Ministral 3 3B Instruct 2512

Ministral-3-3B-Instruct-2512 is a multimodal vision-language model with 3B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.

Model Intention: Multimodal vision-language model with enhanced instruction following, optimized for efficient deployment and visual understanding

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Ministral-3-3B-Instruct-2512-Q4_0.gguf

Model Info URL: https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512

Model License: License Info

Model Description: Ministral-3-3B-Instruct-2512 is a multimodal vision-language model with 3B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.

Developer: https://huggingface.co/mistralai

File Size: 1900 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Ministral 3 3B Reasoning 2512

Ministral-3-3B-Reasoning-2512 is a compact reasoning-focused language model with 3.4B parameters, designed for edge deployment with enhanced multi-step reasoning capabilities. It excels at complex reasoning tasks including mathematics (MATH Maj@1: 83.0%), coding (LiveCodeBench: 54.8%), and scientific problem-solving (GPQA Diamond: 53.4%). With 256K context window and multilingual support for dozens of languages, it provides strong reasoning performance while maintaining efficient resource usage for mobile and edge devices.

Model Intention: Compact reasoning model with enhanced multi-step reasoning, mathematics, and scientific problem-solving capabilities

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Ministral-3-3B-Reasoning-2512-Q4_0.gguf

Model Info URL: https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512

Model License: License Info

Model Description: Ministral-3-3B-Reasoning-2512 is a compact reasoning-focused language model with 3.4B parameters, designed for edge deployment with enhanced multi-step reasoning capabilities. It excels at complex reasoning tasks including mathematics (MATH Maj@1: 83.0%), coding (LiveCodeBench: 54.8%), and scientific problem-solving (GPQA Diamond: 53.4%). With 256K context window and multilingual support for dozens of languages, it provides strong reasoning performance while maintaining efficient resource usage for mobile and edge devices.

Developer: https://huggingface.co/mistralai

File Size: 2100 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Test Corrupted Model

Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.

Model Intention: Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities up to 128K context

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/test.gguf

Model Info URL: https://huggingface.co/flyingfishinwater

Model License: License Info

Model Description: Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.

Developer: https://huggingface.co/Qwen

File Size: 19 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Test Vision Model

Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.

Model Intention: Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities up to 128K context

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/test.gguf

Model Info URL: https://huggingface.co/flyingfishinwater

Model License: License Info

Model Description: Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.

Developer: https://huggingface.co/Qwen

File Size: 19 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Test Non Exist Model

Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.

Model Intention: Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities up to 128K context

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/test-non-exist.gguf

Model Info URL: https://huggingface.co/flyingfishinwater

Model License: License Info

Model Description: Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.

Developer: https://huggingface.co/Qwen

File Size: 19 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 0.6B MLX

Qwen3-0.6B-MLX is a compact 0.6B parameter model optimized for Apple Silicon using MLX framework. It provides fast inference with minimal resource usage while maintaining strong performance for text generation, reasoning, and function calling. Ideal for mobile deployment, testing scenarios, and applications requiring quick responses with efficient memory usage.

Model Intention: Ultra-compact 0.6B parameter MLX-optimized model for efficient on-device inference with fast predictions and function calls

Model URL: https://huggingface.co/mlx-community/Qwen3-0.6B-4bit

Model Info URL: https://huggingface.co/mlx-community/Qwen3-0.6B-4bit

Model License: License Info

Model Description: Qwen3-0.6B-MLX is a compact 0.6B parameter model optimized for Apple Silicon using MLX framework. It provides fast inference with minimal resource usage while maintaining strong performance for text generation, reasoning, and function calling. Ideal for mobile deployment, testing scenarios, and applications requiring quick responses with efficient memory usage.

Developer: https://huggingface.co/mlx-community

File Size: 353 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen2.5-VL 3B Instruct MLX

Qwen2.5-VL-3B-Instruct-MLX is a multimodal vision-language model optimized for Apple Silicon using the MLX framework. It combines 3B parameter language model with vision capabilities, enabling both image2text and text2text processing. The model supports enhanced reasoning, coding, mathematics, and multilingual understanding with up to 128K context length. Optimized specifically for efficient on-device inference on Apple devices.

Model Intention: Multimodal vision-language model optimized for Apple Silicon with MLX, supporting both image and text processing with enhanced reasoning

Model URL: https://huggingface.co/mlx-community/Qwen2.5-VL-3B-Instruct-4bit

Model Info URL: https://huggingface.co/mlx-community/Qwen2.5-VL-3B-Instruct-4bit

Model License: License Info

Model Description: Qwen2.5-VL-3B-Instruct-MLX is a multimodal vision-language model optimized for Apple Silicon using the MLX framework. It combines 3B parameter language model with vision capabilities, enabling both image2text and text2text processing. The model supports enhanced reasoning, coding, mathematics, and multilingual understanding with up to 128K context length. Optimized specifically for efficient on-device inference on Apple devices.

Developer: https://huggingface.co/mlx-community

File Size: 3070 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Granite 4.0 H Micro MLX

Granite 4.0 H Micro is a 3B parameter long-context instruct model from IBM's Granite team, aligned through supervised finetuning, reinforcement learning, and model merging. The MLX 4-bit conversion preserves the 128K context window, safety-aligned default system prompt, multilingual coverage, and advanced tool-calling support, making it ideal for privacy-first enterprise assistants running on Apple silicon devices.

Model Intention: Long-context 3B Granite 4.0 instruct model tuned for enterprise copilots with strong tool execution while fitting on-device memory budgets

Model URL: https://huggingface.co/mlx-community/granite-4.0-h-micro-4bit

Model Info URL: https://huggingface.co/ibm-granite/granite-4.0-h-micro

Model License: License Info

Model Description: Granite 4.0 H Micro is a 3B parameter long-context instruct model from IBM's Granite team, aligned through supervised finetuning, reinforcement learning, and model merging. The MLX 4-bit conversion preserves the 128K context window, safety-aligned default system prompt, multilingual coverage, and advanced tool-calling support, making it ideal for privacy-first enterprise assistants running on Apple silicon devices.

Developer: https://huggingface.co/ibm-granite

Update Date: 2025-10-02

Update History: 2025-10-02: MLX 4-bit conversion published using mlx-lm 0.28.2

File Size: 620 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: granite

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 1.7B MLX

Qwen3-1.7B-MLX is a compact 1.7B parameter model optimized for Apple Silicon using MLX framework. It provides fast inference with minimal resource usage while maintaining strong performance for text generation, reasoning, and function calling. The model supports dynamic thinking mode control and excels at instruction following, making it ideal for mobile deployment, testing scenarios, and applications requiring quick responses with efficient memory usage.

Model Intention: Compact 1.7B parameter MLX-optimized model for efficient on-device inference with fast predictions and function calls

Model URL: https://huggingface.co/mlx-community/Qwen3-1.7B-4bit

Model Info URL: https://huggingface.co/Qwen/Qwen3-1.7B

Model License: License Info

Model Description: Qwen3-1.7B-MLX is a compact 1.7B parameter model optimized for Apple Silicon using MLX framework. It provides fast inference with minimal resource usage while maintaining strong performance for text generation, reasoning, and function calling. The model supports dynamic thinking mode control and excels at instruction following, making it ideal for mobile deployment, testing scenarios, and applications requiring quick responses with efficient memory usage.

Developer: https://huggingface.co/mlx-community

File Size: 1100 MB

Context Length: 4096 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 4B MLX

Qwen3-4B-MLX is a 4B parameter model optimized for Apple Silicon using MLX framework. It delivers strong performance in reasoning, mathematics, coding, and instruction following tasks. With 32K context length and multilingual support, this model provides an excellent balance between capability and efficiency for on-device deployment. The MLX optimization ensures fast inference while maintaining high-quality text generation across diverse tasks.

Model Intention: 4B parameter MLX-optimized model with enhanced reasoning, instruction following, and multilingual capabilities

Model URL: https://huggingface.co/mlx-community/Qwen3-4B-4bit

Model Info URL: https://huggingface.co/Qwen/Qwen3-4B

Model License: License Info

Model Description: Qwen3-4B-MLX is a 4B parameter model optimized for Apple Silicon using MLX framework. It delivers strong performance in reasoning, mathematics, coding, and instruction following tasks. With 32K context length and multilingual support, this model provides an excellent balance between capability and efficiency for on-device deployment. The MLX optimization ensures fast inference while maintaining high-quality text generation across diverse tasks.

Developer: https://huggingface.co/mlx-community

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 4B Thinking MLX

Qwen3-4B-Thinking-2507-MLX is a specialized 4B parameter reasoning model optimized for Apple Silicon using MLX framework. It features enhanced thinking mode capabilities, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks. The thinking mode enables step-by-step problem solving with 32K context length, making it ideal for applications requiring deep analytical capabilities and complex problem-solving.

Model Intention: Advanced reasoning MLX-optimized model with thinking mode for complex logical reasoning, mathematics, and scientific tasks

Model URL: https://huggingface.co/mlx-community/Qwen3-4B-Thinking-2507-4bit

Model Info URL: https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507

Model License: License Info

Model Description: Qwen3-4B-Thinking-2507-MLX is a specialized 4B parameter reasoning model optimized for Apple Silicon using MLX framework. It features enhanced thinking mode capabilities, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks. The thinking mode enables step-by-step problem solving with 32K context length, making it ideal for applications requiring deep analytical capabilities and complex problem-solving.

Developer: https://huggingface.co/mlx-community

File Size: 2100 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


LFM2.5 1.2B Instruct MLX

LFM2.5-1.2B-Instruct-MLX-4bit is the MLX conversion of Liquid AI's latest 1.2B hybrid model with extended pre-training (28T tokens) and reinforcement learning. The 4-bit quantization preserves the model's strong performance on agentic tasks, data extraction, RAG workflows, and multi-turn conversations while fitting efficiently in on-device memory. It features function calling with custom tool tags, 32K context window, and multilingual support across 8 languages, rivaling much larger models with optimized inference on Apple silicon.

Model Intention: Best-in-class 1.2B hybrid model optimized for agentic tasks, data extraction, RAG, and fast edge inference with tool calling support

Model URL: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-4bit

Model Info URL: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct

Model License: License Info

Model Description: LFM2.5-1.2B-Instruct-MLX-4bit is the MLX conversion of Liquid AI's latest 1.2B hybrid model with extended pre-training (28T tokens) and reinforcement learning. The 4-bit quantization preserves the model's strong performance on agentic tasks, data extraction, RAG workflows, and multi-turn conversations while fitting efficiently in on-device memory. It features function calling with custom tool tags, 32K context window, and multilingual support across 8 languages, rivaling much larger models with optimized inference on Apple silicon.

Developer: https://huggingface.co/LiquidAI

File Size: 700 MB

Context Length: 8000 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


LFM2.5-VL 1.6B MLX

LFM2.5-VL-1.6B-MLX-4bit is the MLX conversion of Liquid AI's vision-language model built on the LFM2.5-1.2B-Base backbone with SigLIP2 NaFlex vision encoder (400M parameters). The 4-bit quantization enables efficient on-device inference on Apple Silicon while retaining enhanced instruction following, multilingual vision understanding across 8 languages, and robust visual content processing with multi-image support, high-resolution handling, and OCR capabilities. With 32K context window and 1.6B parameters (2B total with vision encoder), it excels at general vision-language workloads, document comprehension, and multi-image reasoning.

Model Intention: Enhanced multimodal vision-language model with improved instruction following, multilingual vision understanding, and robust visual content processing including OCR

Model URL: https://huggingface.co/mlx-community/LFM2.5-VL-1.6B-4bit

Model Info URL: https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B

Model License: License Info

Model Description: LFM2.5-VL-1.6B-MLX-4bit is the MLX conversion of Liquid AI's vision-language model built on the LFM2.5-1.2B-Base backbone with SigLIP2 NaFlex vision encoder (400M parameters). The 4-bit quantization enables efficient on-device inference on Apple Silicon while retaining enhanced instruction following, multilingual vision understanding across 8 languages, and robust visual content processing with multi-image support, high-resolution handling, and OCR capabilities. With 32K context window and 1.6B parameters (2B total with vision encoder), it excels at general vision-language workloads, document comprehension, and multi-image reasoning.

Developer: https://huggingface.co/LiquidAI

File Size: 1100 MB

Context Length: 8000 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3-VL 4B Instruct MLX

Qwen3-VL-4B-Instruct-MLX is a multimodal vision-language model optimized for Apple Silicon using the MLX framework. It combines 4B parameter language model with vision capabilities, enabling both image2text and text2text processing. The model supports enhanced reasoning, coding, mathematics, and multilingual understanding, optimized specifically for efficient on-device inference on Apple devices.

Model Intention: Multimodal vision-language model optimized for Apple Silicon with MLX, supporting both image and text processing

Model URL: https://huggingface.co/lmstudio-community/Qwen3-VL-4B-Instruct-MLX-4bit

Model Info URL: https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct

Model License: License Info

Model Description: Qwen3-VL-4B-Instruct-MLX is a multimodal vision-language model optimized for Apple Silicon using the MLX framework. It combines 4B parameter language model with vision capabilities, enabling both image2text and text2text processing. The model supports enhanced reasoning, coding, mathematics, and multilingual understanding, optimized specifically for efficient on-device inference on Apple devices.

Developer: https://huggingface.co/lmstudio-community

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3-VL 4B Thinking MLX

Qwen3-VL-4B-Thinking-MLX is a specialized multimodal vision-language reasoning model optimized for Apple Silicon using MLX framework. It features enhanced thinking mode capabilities, providing significantly improved performance on complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, optimized for efficient on-device inference.

Model Intention: Advanced multimodal reasoning MLX-optimized model with thinking mode for complex visual reasoning tasks

Model URL: https://huggingface.co/mlx-community/Qwen3-VL-4B-Thinking-4bit

Model Info URL: https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking

Model License: License Info

Model Description: Qwen3-VL-4B-Thinking-MLX is a specialized multimodal vision-language reasoning model optimized for Apple Silicon using MLX framework. It features enhanced thinking mode capabilities, providing significantly improved performance on complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, optimized for efficient on-device inference.

Developer: https://huggingface.co/mlx-community

File Size: 2100 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3-VL 2B Instruct MLX

Qwen3-VL-2B-Instruct-MLX is a compact multimodal vision-language model optimized for Apple Silicon using the MLX framework. It combines 2B parameter language model with vision capabilities, enabling both image2text and text2text processing. The model supports enhanced reasoning, instruction following, and multilingual understanding, optimized specifically for efficient on-device inference on Apple devices with 8-bit quantization.

Model Intention: Compact multimodal vision-language model optimized for Apple Silicon with MLX, supporting both image and text processing

Model URL: https://huggingface.co/lmstudio-community/Qwen3-VL-2B-Instruct-MLX-8bit

Model Info URL: https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct

Model License: License Info

Model Description: Qwen3-VL-2B-Instruct-MLX is a compact multimodal vision-language model optimized for Apple Silicon using the MLX framework. It combines 2B parameter language model with vision capabilities, enabling both image2text and text2text processing. The model supports enhanced reasoning, instruction following, and multilingual understanding, optimized specifically for efficient on-device inference on Apple devices with 8-bit quantization.

Developer: https://huggingface.co/lmstudio-community

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Test MLX

Test MLX download and load

Model Intention: Test MLX download and load

Model URL: https://huggingface.co/flyingfishinwater/test_mlx

Model Info URL: https://huggingface.co/flyingfishinwater/test_mlx

Model License: License Info

Model Description: Test MLX download and load

Developer: https://huggingface.co/mlx-community

File Size: 20 MB

Context Length: 2048 tokens

Prompt Format:


Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Nanbeige4 3B Thinking

Nanbeige4-3B-Thinking-2511 is a compact 3B parameter reasoning model with built-in chain-of-thought thinking capabilities. It features native tool-calling support with custom tags, making it suitable for agentic workflows. The model balances reasoning quality with efficiency, enabling complex problem solving within a small footprint for on-device inference.

Model Intention: Compact 3B reasoning model with chain-of-thought thinking and native tool-calling support, optimized for agentic tasks

Model URL: https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Nanbeige4-3B-Thinking-2511.Q4_K_M.gguf

Model Info URL: https://huggingface.co/Nanbeige/Nanbeige4-3B-Thinking-2511

Model License: License Info

Model Description: Nanbeige4-3B-Thinking-2511 is a compact 3B parameter reasoning model with built-in chain-of-thought thinking capabilities. It features native tool-calling support with custom tags, making it suitable for agentic workflows. The model balances reasoning quality with efficiency, enabling complex problem solving within a small footprint for on-device inference.

Developer: https://huggingface.co/Nanbeige

File Size: 2000 MB

Context Length: 4000 tokens

Prompt Format:


Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Downloads last month
2,299
GGUF
Model size
0.4B params
Architecture
ernie4_5
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support