|
ANEMLL |
|
|
|
ANEMLL (pronounced like “animal”) is an open-source project |
|
focused on accelerating the porting of Large Language Models (LLMs) |
|
to tensor processors, starting with the Apple Neural Engine (ANE). |
|
|
|
The goal is to provide a fully open-source pipeline |
|
from model conversion to inference for common LLM architectures |
|
running on ANE. |
|
|
|
This enables seamless integration and on-device inference |
|
for low-power applications on edge devices, |
|
ensuring maximum privacy and security. |
|
|
|
This is critical for autonomous applications, |
|
where models run directly on the device |
|
without requiring an internet connection. |
|
|
|
License |
|
|
|
ANEMLL is licensed under the MIT License. |
|
https://opensource.org/license/mit |
|
The model is based on Meta’s LLaMA 3.2 and may require a separate license. |
|
|
|
|
|
|
|
This test model is exclusively for the Meta's LLaMA 3.2 1B (1024 context) model converted for CoreML, |
|
released before the official launch of the ANEMLL repository and minimal documentation. |
|
It is intended for early adopters only who requested an early release. |
|
|
|
|
|
Requirements |
|
• macOS Sequoia with Apple Neural Engine and 16GB RAM |
|
• CoreML Tools and HuggingFace Transformers libraries |
|
• Python 3.9 |
|
|
|
chat.py provides a sample inference script. |
|
We apologize for the current quality of chat.py and appreciate your patience. |
|
|
|
|
|
Prerequisites: |
|
pip install coremltools transformers |
|
|
|
How to RUN: |
|
python chat.py |
|
|
|
Ctr-D to exit, Ctr-C to interrupt inference. |
|
|
|
alternative way to run: |
|
python chat.py S123 -d /path/to/anemll-LLAMA32-1B-ctx1024 ctx=1024 |
|
|
|
The first time the model loads, macOS will take some time to place it on the device. |
|
Subsequent loads will be instantaneous. |
|
|
|
Please check following links for later updates: |
|
https://huggingface.co/anemll |
|
https://x.com/anemll |
|
https://github.com/anemll |
|
https://anemll.com |
|
|
|
[email protected] |
|
|
|
|