Llamba-8B-untied / README.md
AvivBick's picture
Update README.md
2b5a9d1 verified
metadata
tags:
  - Llamba
  - recurrent-models
  - distillation
  - cartesia
  - edge
license: apache-2.0
library_name: cartesia-pytorch
datasets:
  - ai2_arc
  - PIQA
  - Winogrande
  - HellaSwag
  - Lambada
  - MMLU
  - OpenBookQA
inference:
  precision: bf16
  hardware: gpu

Llamba Models

The Llamba models are part of Cartesia's Edge library, designed for efficient, high-performance machine learning applications.

For more details, refer to the paper.


Usage

Llamba on PyTorch

To use Llamba with PyTorch:

  1. Install the required package:
pip install --no-binary :all: cartesia-pytorch
  1. Load and run the model
from transformers import AutoTokenizer
from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel

model = LlambaLMHeadModel.from_pretrained("cartesia-ai/Llamba-8B", strict=True).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
input_ids = input_ids.to('cuda')
output = model.generate(input_ids, max_length=100)[0]
print(tokenizer.decode(output, skip_special_tokens=True))

Llamba on MLX

To run Llamba with the Metal framework see cartesia-metal


Evaluations

The Llamba models have been evaluated on multiple standard benchmarks, demonstrating efficiency gains while maintaining strong performance. Below are the results:

Model ARC-C (0-shot) ARC-C (25-shot) ARC-E (0-shot) ARC-E (25-shot) PIQA (0-shot) PIQA (10-shot) WG (0-shot) WG (5-shot)
Llamba-1B 37.2 41.8 69.5 71.2 74.0 74.3 60.6 58.1
Llamba-3B 48.5 53.0 79.0 81.1 78.6 79.5 70.4 72.4
Llamba-8B 54.6 60.0 82.5 85.8 80.9 81.5 73.3 76.9
Model HS (0-shot) HS (10-shot) LMB (0-shot) LMB (10-shot) MMLU (0-shot) MMLU (5-shot) OBQA (0-shot) OBQA (10-shot)
Llamba-1B 61.2 60.2 48.4 39.0 38.0 31.3 37.0 38.0
Llamba-3B 73.8 74.3 65.8 60.0 52.7 50.3 42.8 42.8
Llamba-8B 77.6 78.7 69.4 65.0 61.0 60.0 43.4 45.8

More details on model performance, benchmarks, and evaluation metrics can be found in the paper.