metadata

tags:
  - Llamba
  - recurrent-models
  - distillation
  - cartesia
  - edge
license: apache-2.0
library_name: cartesia-pytorch
datasets:
  - ai2_arc
  - PIQA
  - Winogrande
  - HellaSwag
  - Lambada
  - MMLU
  - OpenBookQA
inference:
  precision: bf16
  hardware: gpu

Llamba Models

The Llamba models are part of Cartesia's Edge library, designed for efficient, high-performance machine learning applications.

For more details, refer to the paper.

Usage

Llamba on PyTorch

To use Llamba with PyTorch:

Install the required package:

pip install --no-binary :all: cartesia-pytorch

Load and run the model

from transformers import AutoTokenizer
from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel

model = LlambaLMHeadModel.from_pretrained("cartesia-ai/Llamba-8B-untied", strict=True).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
input_ids = input_ids.to('cuda')
output = model.generate(input_ids, max_length=100)[0]
print(tokenizer.decode(output, skip_special_tokens=True))

Llamba on MLX

To run Llamba with the Metal framework see cartesia-metal

Evaluations

The Llamba models have been evaluated on multiple standard benchmarks, demonstrating efficiency gains while maintaining strong performance. Below are the results:

Model	ARC-C (0-shot)	ARC-C (25-shot)	ARC-E (0-shot)	ARC-E (25-shot)	PIQA (0-shot)	PIQA (10-shot)	WG (0-shot)	WG (5-shot)
Llamba-1B	37.2	41.8	69.5	71.2	74.0	74.3	60.6	58.1
Llamba-3B	48.5	53.0	79.0	81.1	78.6	79.5	70.4	72.4
Llamba-8B	54.6	60.0	82.5	85.8	80.9	81.5	73.3	76.9

Model	HS (0-shot)	HS (10-shot)	LMB (0-shot)	LMB (10-shot)	MMLU (0-shot)	MMLU (5-shot)	OBQA (0-shot)	OBQA (10-shot)
Llamba-1B	61.2	60.2	48.4	39.0	38.0	31.3	37.0	38.0
Llamba-3B	73.8	74.3	65.8	60.0	52.7	50.3	42.8	42.8
Llamba-8B	77.6	78.7	69.4	65.0	61.0	60.0	43.4	45.8

More details on model performance, benchmarks, and evaluation metrics can be found in the paper.