2 8 21

Shaij PRO

appoose

AI & ML interests

None yet

Recent Activity

liked a model about 3 hours ago

mobiuslabsgmbh/Meta-Llama-3-8B-Instruct_4bitgs64_hqq_hf

liked a model 8 days ago

mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1

upvoted a collection 9 days ago

DeepSeek-R1-ReDistill

View all activity

Organizations

appoose's activity

posted an update 6 months ago

Post

2075

Releasing HQQ Llama-3.1-70b 4-bit quantized version! Check it out at mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq.

Achieves 99% of the base model performance across various benchmarks! Details in the model card.

posted an update 6 months ago

Post

1792

Excited to announce the release of our high-quality Llama-3.1 8B 4-bit HQQ/calibrated quantized model! Achieving an impressive 99.3% relative performance to FP16, it also delivers the fastest inference speed for transformers.

mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib

1 reply

reacted to osanseviero's post with 🔥 11 months ago

Post

2078

Diaries of Open Source. Part 11 🚀

🚀Databricks release DBRX, potentially the best open access model! A 132B Mixture of Experts with 36B active params and trained on 12 trillion tokens
Blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Base and instruct models: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: https://hf.co/spaces/databricks/dbrx-instruct

🤏1-bit and 2-bit quantization exploration using HQQ+
Blog post: https://mobiusml.github.io/1bit_blog/
Models: https://hf.co/collections/mobiuslabsgmbh/llama2-7b-hqq-6604257a96fc8b9c4e13e0fe
GitHub: https://github.com/mobiusml/hqq

📚Cosmopedia: a large-scale synthetic dataset for pre-training - it includes 25 billion tokens and 30 million files
Dataset: HuggingFaceTB/cosmopedia
Blog: https://hf.co/blog/cosmopedia

⭐Mini-Gemini: multi-modal VLMs, from 2B to 34B
Models: https://hf.co/collections/YanweiLi/mini-gemini-6603c50b9b43d044171d0854
Paper: Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models (2403.18814)
GitHub: https://github.com/dvlab-research/MiniGemini

🔥VILA - On Pre-training for VLMs
Models: Efficient-Large-Model/vila-on-pre-training-for-visual-language-models-65d8022a3a52cd9bcd62698e
Paper: VILA: On Pre-training for Visual Language Models (2312.07533)

Misc
👀 FeatUp: a framework for image features at any resolution: mhamilton723/FeatUp FeatUp: A Model-Agnostic Framework for Features at Any Resolution (2403.10516)
🍞ColBERTus Maxiums, a colbertialized embedding model mixedbread-ai/mxbai-colbert-large-v1
🖌️Semantic Palette, a new drawing paradigm ironjr/SemanticPalette
🧑‍⚕️HistoGPT, a vision model that generates accurate pathology reports marr-peng-lab/histogpt https://www.medrxiv.org/content/10.1101/2024.03.15.24304211v1

4 replies

reacted to macadeliccc's post with 👍❤️ 11 months ago

Post

Quantize 7B paramater models in 60 seconds using Half Quadratic Quantization (HQQ).

This game-changing technique allows for rapid quantization of models like Llama-2-70B in under 5 minutes, outperforming traditional methods by 50x in speed and offering high-quality compression without calibration data.

Mobius Labs innovative approach not only significantly reduces memory requirements but also enables the use of large models on consumer-grade GPUs, paving the way for more accessible and efficient machine learning research.

Mobius Labs' method utilizes a robust optimization formulation to determine the optimal quantization parameters, specifically targeting the minimization of errors between original and dequantized weights. This involves employing a loss function that promotes sparsity and utilizes a non-convex lp<1-norm, making the problem challenging yet solvable through a Half-Quadratic solver.

This solver simplifies the problem by introducing an extra variable and dividing the optimization into manageable sub-problems. Their implementation cleverly fixes the scale parameter to simplify calculations and focuses on optimizing the zero-point, utilizing closed-form solutions for each sub-problem to bypass the need for gradient calculations.

Check out the colab demo where you are able to quantize models (text generation and multimodal) for use with vLLM or Timm backend as well as transformers!

AutoHQQ: 👉 https://colab.research.google.com/drive/1cG_5R_u9q53Uond7F0JEdliwvoeeaXVN?usp=sharing
Code: https://github.com/mobiusml/hqq
HQQ Blog post: https://mobiusml.github.io/hqq_blog/

Edit: Here is an example of how powerful HQQ can be: macadeliccc/Nous-Hermes-2-Mixtral-8x7B-DPO-HQQ

Citations:

@misc {badri2023hqq,
title = {Half-Quadratic Quantization of Large Machine Learning Models},
url = {https://mobiusml.github.io/hqq_blog/},
author = {Hicham Badri and Appu Shaji},
month = {November},
year = {2023}
}