Convenient Quants
Collection
Small body, high IQ
•
10 items
•
Updated
175 tok/sec on a M4 Mac
Performance evaluation:
arc_challenge
acc 0.33, norm 0.36, stderr 0.014
arc_easy
acc 0.45, norm 0.38, stderr 0.009
boolq
acc 0.62, norm 0.62, stderr 0.008
hellaswag
acc 0.44, norm 0.52, stderr 0.004
openbookqa
acc 0.21, norm 0.38, stderr 0.021
piqa
acc 0.70, norm 0.69, stderr 0.010
winogrande
acc 0.55, norm 0.55, stderr 0.013
Performance evaluation of the source model:
21194/21194 [39:58<00:00, 8.84it/s]
arc_challenge
acc 0.34, norm 0.35, stderr 0.013
arc_easy
acc 0.46, norm 0.39, stderr 0.010
boolq
acc 0.62, norm 0.62, stderr 0.008
hellaswag
acc 0.44, norm 0.53, stderr 0.004
openbookqa
acc 0.23, norm 0.39, stderr 0.021
piqa
acc 0.70, norm 0.69, stderr 0.010
winogrande
acc 0.56, norm 0.55, stderr 0.013
This model Lucy-128k-dq68-mlx was converted to MLX format from Menlo/Lucy-128k using mlx-lm version 0.26.0.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Lucy-128k-dq68-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)