Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
| 15 |
You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
| 16 |
|
| 17 |
-
This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't
|
| 18 |
|
| 19 |
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
|
| 20 |
|
|
@@ -26,7 +26,10 @@ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co
|
|
| 26 |
|
| 27 |
## Set up the environment
|
| 28 |
|
| 29 |
-
First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
```
|
| 32 |
sudo apt-get update -y \
|
|
@@ -67,10 +70,11 @@ Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
|
|
| 67 |
[{'generated_text': 'Hi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if'}]
|
| 68 |
```
|
| 69 |
|
| 70 |
-
##Compiling for different instances or settings
|
| 71 |
|
| 72 |
If this repository doesn't have the exact version or settings, you can compile your own.
|
| 73 |
|
|
|
|
| 74 |
from optimum.neuron import NeuronModelForCausalLM
|
| 75 |
#num_cores should be changed based on the instance. inf2.24xlarge has 6 neuron processors (they have two cores each) so 12 total
|
| 76 |
input_shapes = {"batch_size": 1, "sequence_length": 4096}
|
|
@@ -81,6 +85,7 @@ model.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
|
|
| 81 |
from transformers import AutoTokenizer
|
| 82 |
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-v1.0")
|
| 83 |
tokenizer.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
|
|
|
|
| 84 |
|
| 85 |
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
|
| 86 |
|
|
|
|
| 14 |
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
| 15 |
You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
| 16 |
|
| 17 |
+
This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't what you are looking for.
|
| 18 |
|
| 19 |
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
|
| 20 |
|
|
|
|
| 26 |
|
| 27 |
## Set up the environment
|
| 28 |
|
| 29 |
+
First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled, but hasn't been updated to 2.16 as of 1/13/24.
|
| 30 |
+
However, you will need version 2.16 to use these binaries. 2.16 shows a significant performance increase over 2.15 for Llama based models.
|
| 31 |
+
|
| 32 |
+
The commands below will update your 2.15 libraries to 2.16.
|
| 33 |
|
| 34 |
```
|
| 35 |
sudo apt-get update -y \
|
|
|
|
| 70 |
[{'generated_text': 'Hi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if'}]
|
| 71 |
```
|
| 72 |
|
| 73 |
+
## Compiling for different instances or settings
|
| 74 |
|
| 75 |
If this repository doesn't have the exact version or settings, you can compile your own.
|
| 76 |
|
| 77 |
+
```
|
| 78 |
from optimum.neuron import NeuronModelForCausalLM
|
| 79 |
#num_cores should be changed based on the instance. inf2.24xlarge has 6 neuron processors (they have two cores each) so 12 total
|
| 80 |
input_shapes = {"batch_size": 1, "sequence_length": 4096}
|
|
|
|
| 85 |
from transformers import AutoTokenizer
|
| 86 |
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-v1.0")
|
| 87 |
tokenizer.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
|
| 88 |
+
```
|
| 89 |
|
| 90 |
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
|
| 91 |
|