Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) Hare-1.1B-base - GGUF - Model creator: https://huggingface.co/LiteAI/ - Original model: https://huggingface.co/LiteAI/Hare-1.1B-base/ | Name | Quant method | Size | | ---- | ---- | ---- | | [Hare-1.1B-base.Q2_K.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q2_K.gguf) | Q2_K | 0.41GB | | [Hare-1.1B-base.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.IQ3_XS.gguf) | IQ3_XS | 0.45GB | | [Hare-1.1B-base.IQ3_S.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.IQ3_S.gguf) | IQ3_S | 0.48GB | | [Hare-1.1B-base.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q3_K_S.gguf) | Q3_K_S | 0.47GB | | [Hare-1.1B-base.IQ3_M.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.IQ3_M.gguf) | IQ3_M | 0.49GB | | [Hare-1.1B-base.Q3_K.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q3_K.gguf) | Q3_K | 0.52GB | | [Hare-1.1B-base.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q3_K_M.gguf) | Q3_K_M | 0.52GB | | [Hare-1.1B-base.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q3_K_L.gguf) | Q3_K_L | 0.56GB | | [Hare-1.1B-base.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.IQ4_XS.gguf) | IQ4_XS | 0.58GB | | [Hare-1.1B-base.Q4_0.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q4_0.gguf) | Q4_0 | 0.61GB | | [Hare-1.1B-base.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.IQ4_NL.gguf) | IQ4_NL | 0.61GB | | [Hare-1.1B-base.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q4_K_S.gguf) | Q4_K_S | 0.61GB | | [Hare-1.1B-base.Q4_K.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q4_K.gguf) | Q4_K | 0.64GB | | [Hare-1.1B-base.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q4_K_M.gguf) | Q4_K_M | 0.64GB | | [Hare-1.1B-base.Q4_1.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q4_1.gguf) | Q4_1 | 0.67GB | | [Hare-1.1B-base.Q5_0.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q5_0.gguf) | Q5_0 | 0.73GB | | [Hare-1.1B-base.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q5_K_S.gguf) | Q5_K_S | 0.73GB | | [Hare-1.1B-base.Q5_K.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q5_K.gguf) | Q5_K | 0.74GB | | [Hare-1.1B-base.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q5_K_M.gguf) | Q5_K_M | 0.74GB | | [Hare-1.1B-base.Q5_1.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q5_1.gguf) | Q5_1 | 0.79GB | | [Hare-1.1B-base.Q6_K.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q6_K.gguf) | Q6_K | 0.86GB | | [Hare-1.1B-base.Q8_0.gguf](https://huggingface.co/RichardErkhov/LiteAI_-_Hare-1.1B-base-gguf/blob/main/Hare-1.1B-base.Q8_0.gguf) | Q8_0 | 1.11GB | Original model description: --- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - Hare datasets: - cerebras/SlimPajama-627B - HuggingFaceTB/cosmopedia arxiv: 2406.11410 ---
GitHub | 🤖 ModelScope | 📑 ArXiv
Hare-1.1B-base is a pre-trained model developed by the LiteAI Team from China Telecom Guizhou Branch. We use a mix of high-quality open-source data and strategy-generated data as pre-train data. The model is only 1.1B in size and has performed well on the Open LLM Leaderboard. - We chose Mistral as the foundational architecture and reused its tokenizer, reducing the number of parameters by adjusting the hyperparameters of its model architecture. Consequently, our model can be directly applied to numerous open-source projects that support Mistral, such as vLLM. - Our model has a parameter count of only 1.1 billion, allowing us to deploy it on consumer-grade GPUs, mobile devices, and other cost-effective platforms. - We have explored efficient training at FP8 precision and have compiled a set of best practices, hoping to contribute as much as we can to LLM training in the open-source community. For best practices, please see our GitHub homepage. - We are currently developing and adapting for Chinese language support. Hare-1.1B-base是由中国电信股份有限公司贵州分公司LiteAI团队开发的预训练模型。我们使用高质量开源和策略生成的合成数据作为预训练数据。该模型大小仅为1.1B,并在Open LLM Leaderboard上表现优异。 - 我们选择Mistral架构作为基础框架,并复用了其分词器,通过调整模型架构的超参来减少参数量。因此,我们的模型可以直接应用于许多支持Mistral的开源项目,如vLLM。 - 我们模型的参数量仅为 11 亿,因此,我们可以将模型部署到消费级显卡、手机端等成本较低的设备上。 - 我们探索了FP8精度下的高效训练,并总结了一份最佳实践,希望能为开源社区LLM训练作出力所能及的贡献。最佳实践请看GitHub主页。 - 我们正在研发与适配中文。 ## Model Details 模型细节 | Model | Training Tokens | Hidden layers | Hidden Size | Attention Heads | Context Length | |:------:|:--------:|:---------:|:-------------:|:-----------------:|:----------------:| |Hare-1.1B-base | ~ 600B |22 | 2048 | 32 | 2048 | ## Model Description 模型说明 - **Developed by:** LiteAI Team - **Institution:** China Telecom Guizhou Branch - **Model size:** 1.1B - **License:** Apache 2.0 - **开发者:** LiteAI Team - **机构:** 中国电信股份有限公司贵州分公司 - **模型大小:** 1.1B - **协议:** Apache 2.0 ## Uses 模型使用 ### Inference 推理 ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM device = "cuda" if torch.cuda.is_available() else "cpu" model_path = "LiteAI-Team/Hare-1.1B-base" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path) model.to(device) prompt = "Write a poem based on the landscape of Guizhou:" tokens = tokenizer(prompt, add_special_tokens=True, return_tensors='pt').to(device) output = model.generate(**tokens,max_new_tokens=128) output_tokens = output[0].cpu().numpy()[tokens.input_ids.size()[1]:] output_string = tokenizer.decode(output_tokens) print(output_string) >> """The Guizhou landscape is a sight to behold, A place where nature's beauty is unmatched, A land of towering mountains and vast plains, A paradise for those who seek to explore. The mountains rise high above the sky, A sight to beholder, a sight to see, The valleys stretch out as far as the eye can see, A landscape of endless beauty and grace.""" ``` Install with vllm: ```shell pip install vllm ``` ```python from vllm import LLM, SamplingParams from transformers import AutoTokenizer model_path = "LiteAI-Team/Hare-1.1B-base" llm = LLM(model=model_path, trust_remote_code=True, tensor_parallel_size=4) query = "Write a poem based on the landscape of Guizhou:" sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=64) outputs = llm.generate(query, sampling_params) print(outputs) ``` ## Edge Deployment Demo 端侧部署 Our model has only 1.1 billion parameters, and after Int4 quantization, it occupies just 0.6GB of space, allowing for easy deployment on mobile devices, The [Hare-1.1B-Chat](https://huggingface.co/LiteAI/Hare-1.1B-Chat) model weights have been open-sourced. - Android:We chose MLC-LLM as the deployment framework and conducted deployment testing of the Chat model on the Redmi K40. - iOS & HarmonyOS:We will conduct deployment testing on the aforementioned devices in the future. 我们的模型参数量仅有1.1B,经Int4量化后,模型仅占用0.6G的空间,可轻松部署在手机端,[Hare-1.1B-Chat](https://huggingface.co/LiteAI/Hare-1.1B-Chat)模型权重已经开源。 - Android:我们选择MLC-LLM作为部署框架,在Redmi K40上进行Chat模型的部署测试。 - iOS & HarmonyOS:我们将在未来对上述设备进行部署测试。![]() |
![]() |