This model is "Built with Llama".

It is based on meta-llama/Meta-Llama-3.1-8B-Instruct and was created with the help of mergekit. This is the mergekit configuration we used: mergekit_moe_config.yml

It should be noted that this model is the raw model after merging. It still has randomly initialized router networks and will not be better than a single one of its expert models. This model requires further training before use.

This model has a total of 47.5B params, which is slightly more than the Mixtral 8x7b with its 46.7B params.

Licensing

This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 Philip May, Deutsche Telekom AG
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Downloads last month
19
Safetensors
Model size
47.5B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw

Finetuned
(777)
this model