Upload 13 files
Browse files- README.md +188 -3
- config.json +23 -0
- generation_config.json +6 -0
- gitattributes +35 -0
- merges.txt +0 -0
- model-00001-of-00003.safetensors +3 -0
- model-00002-of-00003.safetensors +3 -0
- model-00003-of-00003.safetensors +3 -0
- model.safetensors.index.json +372 -0
- special_tokens_map.json +24 -0
- tokenizer.json +0 -0
- tokenizer_config.json +22 -0
- vocab.json +0 -0
README.md
CHANGED
@@ -1,3 +1,188 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-sa-4.0
|
3 |
+
language:
|
4 |
+
- ca
|
5 |
+
- va
|
6 |
+
tags:
|
7 |
+
- FLOR
|
8 |
+
- bloom
|
9 |
+
- Aitana
|
10 |
+
- catalan
|
11 |
+
- valencian
|
12 |
+
pipeline_tag: text-generation
|
13 |
+
---
|
14 |
+
|
15 |
+
# AITANA-6.3B
|
16 |
+
|
17 |
+
## Table of Contents
|
18 |
+
<details>
|
19 |
+
<summary>Click to expand</summary>
|
20 |
+
|
21 |
+
- [Model description](#model-description)
|
22 |
+
- [Intended uses and limitations](#intended-uses-and-limitations)
|
23 |
+
- [Demo](#demo)
|
24 |
+
- [How to use](#how-to-use)
|
25 |
+
- [Limitations and bias](#limitations-and-bias)
|
26 |
+
- [Training](#training)
|
27 |
+
- [Evaluation](#evaluation)
|
28 |
+
- [Additional information](#additional-information)
|
29 |
+
|
30 |
+
</details>
|
31 |
+
|
32 |
+
## Model description
|
33 |
+
|
34 |
+
**AITANA-6.3B** is a text generation model for causal language modelling with a decoder-only architecture.
|
35 |
+
It has been trained from continuous pre-training based on [FLOR-6.3B](https://huggingface.co/projecte-aina/FLOR-6.3B), with emphasis on data (listed bellow)
|
36 |
+
in Valencian (similar to Catalan). Concretely, a total of 1.304 million tokens per epoch in this first version of the model and two epochs over the data.
|
37 |
+
|
38 |
+
This model is based on FLOR-6.3B as the basis for training and uses the same tokenizer.
|
39 |
+
|
40 |
+
## Intended uses and limitations
|
41 |
+
|
42 |
+
As **FLOR-6.3B**, **AITANA-6.3B** is a base model that can be used for causal language modelling, it can be used as is for text generation,
|
43 |
+
although fine/instruction-tuning on specific tasks is recommended for its final use.
|
44 |
+
|
45 |
+
## Demo
|
46 |
+
|
47 |
+
In the following link you can access an interactive demo to test the text generation in the language model:
|
48 |
+
|
49 |
+
Demo link(https://llm-aitana.gplsi.es/)
|
50 |
+
|
51 |
+
In the demo you can adjust the number of words generated as well as the decoding technique to be used by
|
52 |
+
the model (top p, top k) and other parameters such as temperature.
|
53 |
+
|
54 |
+
## How to use
|
55 |
+
```python
|
56 |
+
import torch
|
57 |
+
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
|
58 |
+
|
59 |
+
input_text = "Les corts valencianes han pres la decisió de"
|
60 |
+
|
61 |
+
model_id = "gplsi/Aitana-6.3B"
|
62 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
63 |
+
generator = pipeline(
|
64 |
+
"text-generation",
|
65 |
+
model=model_id,
|
66 |
+
tokenizer=tokenizer,
|
67 |
+
torch_dtype=torch.bfloat16,
|
68 |
+
trust_remote_code=True,
|
69 |
+
device_map="auto",
|
70 |
+
)
|
71 |
+
generation = generator(
|
72 |
+
input_text,
|
73 |
+
do_sample=True,
|
74 |
+
top_k=10,
|
75 |
+
eos_token_id=tokenizer.eos_token_id,
|
76 |
+
)
|
77 |
+
|
78 |
+
print(f"Result: {generation[0]['generated_text']}")
|
79 |
+
|
80 |
+
```
|
81 |
+
|
82 |
+
## Training
|
83 |
+
|
84 |
+
### Training data
|
85 |
+
|
86 |
+
The training corpus has been obtained using web scraping on public data from different sources such as the
|
87 |
+
[Official Gazette of the University of Alicante (BOUA)](https://www.boua.ua.es/ca), [the Official Gazette of the Generalitat Valenciana (DOGV)](https://dogv.gva.es/va) and accurated data provided by
|
88 |
+
[the Valencian Courts (DSCV and DSCCV)](https://www.cortsvalencianes.es/ca-va/). Giving a total of 1.304 million tokens, according to the following table.
|
89 |
+
|
90 |
+
Dataset | Language | Words (per-epoch) | Epochs | Total Tokens |
|
91 |
+
|---------------------|----------|--------------------|--------------|--------------|
|
92 |
+
DSCV | va | 31.98M | 2 | 57.05M |
|
93 |
+
DSCCV | va | 45.59M | 2 | 80.91M |
|
94 |
+
BOUA | va | 11.65M | 2 | 29.02M |
|
95 |
+
DOGV | va | 301.59M | 2 | 982.33M |
|
96 |
+
DOGCV | va | 54.92M | 2 | 154.32M |
|
97 |
+
|
98 |
+
Several of the downloaded sources have already been used in the FLOR-6.3B training, so the date of data collection for the previous
|
99 |
+
model has been taken into account and those web pages have been scraped from that date.
|
100 |
+
|
101 |
+
Information on the datasets used for training is shown below:
|
102 |
+
|
103 |
+
- BOUA: Official Bulletin of the University of Alicante. In this case we are dealing with documents issued by the University of Alicante in Valencian about grants, calls issued by the university, regulations, resolutions of laws that affect the university environment and corrections of errors of these same documents issued previously.
|
104 |
+
|
105 |
+
- DOGV: Official Journal of the Generalitat Valenciana. This dataset contains official communiqués of different kinds issued by the Generalitat Valenciana, with data entirely in Valencian. It mainly talks about measures taken in the legal field, approval of laws and public sector communiqués. In this case, we have 18 different documents covering communiqués from 1998 to 2018 and three more recent documents with data from 2019 to 2023.
|
106 |
+
|
107 |
+
- DOGCV: in this case it is the Official Journal of the Generalitat Valenciana, but only the historical documents from 1980 to 1997.
|
108 |
+
|
109 |
+
- DSCV: Journal of the Valencian Parliament. This dataset contains transcriptions of the different interventions made during the plenary sessions in the Valencian Parliament by the different participants. It covers data from 2001 to 1999 up to 2022, each transcript comprises an .html file.
|
110 |
+
|
111 |
+
- DSCCV: this is a dataset of the Valencian Parliament diary, centred on transcriptions of the different commissions held. As in the previous case, it is separated into one file for each transcription.
|
112 |
+
|
113 |
+
|
114 |
+
### Training parameters
|
115 |
+
|
116 |
+
During the training of the model, a high context window was desired when generating text, so it was decided to use an input size of 2048
|
117 |
+
tokens and a minimum context window of 512 in case of truncating the input sequences. 80% of the data obtained was used for training stage,
|
118 |
+
while 20% was used during the evaluation stage. A summary of the parameters used during training can be seen in the following table:
|
119 |
+
|
120 |
+
Parameter | Value |
|
121 |
+
|---------------------|---|
|
122 |
+
Epochs | 1 |
|
123 |
+
Learning Rate | 2e-5 |
|
124 |
+
Warmup Steps | 0 |
|
125 |
+
Precission | bf-16 |
|
126 |
+
Weight decay | 1e-1 |
|
127 |
+
Training Fraction | 0.8 |
|
128 |
+
Evaluation Fraction | 0.2 |
|
129 |
+
Input size (tokens) | 2048 |
|
130 |
+
Minimum context window (tokens) | 512 |
|
131 |
+
Training time (hours/epoch) | 40 |
|
132 |
+
|
133 |
+
### Devices
|
134 |
+
|
135 |
+
A total of 4 A100 graphics cards with a maximum capacity of 40 GB each were used to train the model. This meant a training time of approximately
|
136 |
+
40 hours per epoch. Using a mini batch size of size 2 and a batch size of size 32 to calculate back propagation.
|
137 |
+
|
138 |
+
### Distributed Training Strategy
|
139 |
+
|
140 |
+
A distributed training strategy called Fully Sharded Data Parallel ([FSDP](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html))
|
141 |
+
has been used. With this, the entire model has been loaded among the 4 A100s available for training with a mini-batch size of size 2 as
|
142 |
+
previously discussed.
|
143 |
+
|
144 |
+
### Languages
|
145 |
+
|
146 |
+
In addition to the data already used for the training of FLOR-6.3B, data completely in Valencian from the sources mentioned in
|
147 |
+
the previous section have been used.
|
148 |
+
|
149 |
+
|
150 |
+
## Evaluation
|
151 |
+
The model has been evaluated using the loss function and perplexity during the training stage and these metrics have also been
|
152 |
+
obtained during the evaluation stage. Due to the low amount of data, it was decided to perform the evaluation at the end of each epoch.
|
153 |
+
|
154 |
+
Loss and perplexity for train and evaluation
|
155 |
+
|
156 |
+
### Results
|
157 |
+
|
158 |
+
Future benchmarks here.
|
159 |
+
|
160 |
+
## Additional information
|
161 |
+
|
162 |
+
### Author
|
163 |
+
|
164 |
+
GPLSI (https://gplsi.dlsi.ua.es/)
|
165 |
+
|
166 |
+
### Contact
|
167 |
+
|
168 |
+
GPLSI (https://gplsi.dlsi.ua.es/)
|
169 |
+
|
170 |
+
|
171 |
+
### Copyright
|
172 |
+
|
173 |
+
GPLSI (https://gplsi.dlsi.ua.es/)
|
174 |
+
|
175 |
+
### License
|
176 |
+
|
177 |
+
Attribution-NonCommercial-ShareAlike 4.0 International
|
178 |
+
|
179 |
+
This model is free to use for personal and research use. However a commercial license is required for commerical applications.
|
180 |
+
|
181 |
+
### Funding
|
182 |
+
|
183 |
+
ILENIA-VIVES project <<2022/TL22/00215334>>
|
184 |
+
|
185 |
+
### Disclaimer
|
186 |
+
|
187 |
+
The GPLSI research group is not responsible for the inappropriate use of this language model, as well as any bias, toxic language
|
188 |
+
or sensitive information that the model may contain. This language model has been developed for research purposes.
|
config.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "projecte-aina/FLOR-6.3B",
|
3 |
+
"apply_residual_connection_post_layernorm": false,
|
4 |
+
"architectures": [
|
5 |
+
"BloomForCausalLM"
|
6 |
+
],
|
7 |
+
"attention_dropout": 0.0,
|
8 |
+
"bos_token_id": 1,
|
9 |
+
"eos_token_id": 2,
|
10 |
+
"hidden_dropout": 0.0,
|
11 |
+
"hidden_size": 4096,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"layer_norm_epsilon": 1e-05,
|
14 |
+
"model_type": "bloom",
|
15 |
+
"n_head": 32,
|
16 |
+
"n_layer": 30,
|
17 |
+
"pretraining_tp": 1,
|
18 |
+
"slow_but_exact": false,
|
19 |
+
"torch_dtype": "bfloat16",
|
20 |
+
"transformers_version": "4.40.0",
|
21 |
+
"use_cache": true,
|
22 |
+
"vocab_size": 50257
|
23 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.40.0"
|
6 |
+
}
|
gitattributes
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
merges.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model-00001-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:368aa6579a3552ce560bb3e5c64da222939d5c0e0e36302db66a219c3db710dd
|
3 |
+
size 4976378400
|
model-00002-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:88c0dc637f27a9618a6764c73c113697c9263e21b59b9e78e3fd8093398285a0
|
3 |
+
size 4967384080
|
model-00003-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fef39e658d186c0d01dfcf434359bb567c54da7a969c8a8915033ac7ea416a02
|
3 |
+
size 2550809384
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,372 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 12494528512
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"transformer.h.0.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
7 |
+
"transformer.h.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
8 |
+
"transformer.h.0.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
9 |
+
"transformer.h.0.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
10 |
+
"transformer.h.0.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
11 |
+
"transformer.h.0.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
12 |
+
"transformer.h.0.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
13 |
+
"transformer.h.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
14 |
+
"transformer.h.0.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
15 |
+
"transformer.h.0.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
16 |
+
"transformer.h.0.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
17 |
+
"transformer.h.0.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
18 |
+
"transformer.h.1.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
19 |
+
"transformer.h.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
20 |
+
"transformer.h.1.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
21 |
+
"transformer.h.1.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
22 |
+
"transformer.h.1.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
23 |
+
"transformer.h.1.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
24 |
+
"transformer.h.1.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
25 |
+
"transformer.h.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
26 |
+
"transformer.h.1.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
27 |
+
"transformer.h.1.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
28 |
+
"transformer.h.1.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
29 |
+
"transformer.h.1.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
30 |
+
"transformer.h.10.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
31 |
+
"transformer.h.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
32 |
+
"transformer.h.10.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
33 |
+
"transformer.h.10.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
34 |
+
"transformer.h.10.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
35 |
+
"transformer.h.10.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
36 |
+
"transformer.h.10.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
37 |
+
"transformer.h.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
38 |
+
"transformer.h.10.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
39 |
+
"transformer.h.10.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
40 |
+
"transformer.h.10.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
41 |
+
"transformer.h.10.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
42 |
+
"transformer.h.11.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
43 |
+
"transformer.h.11.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
44 |
+
"transformer.h.11.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
45 |
+
"transformer.h.11.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
46 |
+
"transformer.h.11.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
47 |
+
"transformer.h.11.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
48 |
+
"transformer.h.11.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
49 |
+
"transformer.h.11.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
50 |
+
"transformer.h.11.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
51 |
+
"transformer.h.11.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
52 |
+
"transformer.h.11.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
53 |
+
"transformer.h.11.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
54 |
+
"transformer.h.12.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
55 |
+
"transformer.h.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
56 |
+
"transformer.h.12.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
57 |
+
"transformer.h.12.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
58 |
+
"transformer.h.12.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
59 |
+
"transformer.h.12.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
60 |
+
"transformer.h.12.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
61 |
+
"transformer.h.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
62 |
+
"transformer.h.12.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
63 |
+
"transformer.h.12.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
64 |
+
"transformer.h.12.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
65 |
+
"transformer.h.12.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
66 |
+
"transformer.h.13.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
67 |
+
"transformer.h.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
68 |
+
"transformer.h.13.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
69 |
+
"transformer.h.13.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
70 |
+
"transformer.h.13.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
71 |
+
"transformer.h.13.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
72 |
+
"transformer.h.13.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
73 |
+
"transformer.h.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
74 |
+
"transformer.h.13.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
75 |
+
"transformer.h.13.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
76 |
+
"transformer.h.13.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
77 |
+
"transformer.h.13.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
78 |
+
"transformer.h.14.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
79 |
+
"transformer.h.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
80 |
+
"transformer.h.14.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
81 |
+
"transformer.h.14.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
82 |
+
"transformer.h.14.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
83 |
+
"transformer.h.14.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
84 |
+
"transformer.h.14.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
85 |
+
"transformer.h.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
86 |
+
"transformer.h.14.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
87 |
+
"transformer.h.14.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
88 |
+
"transformer.h.14.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
89 |
+
"transformer.h.14.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
90 |
+
"transformer.h.15.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
91 |
+
"transformer.h.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
92 |
+
"transformer.h.15.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
93 |
+
"transformer.h.15.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
94 |
+
"transformer.h.15.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
95 |
+
"transformer.h.15.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
96 |
+
"transformer.h.15.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
97 |
+
"transformer.h.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
98 |
+
"transformer.h.15.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
99 |
+
"transformer.h.15.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
100 |
+
"transformer.h.15.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
101 |
+
"transformer.h.15.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
102 |
+
"transformer.h.16.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
103 |
+
"transformer.h.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
104 |
+
"transformer.h.16.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
105 |
+
"transformer.h.16.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
106 |
+
"transformer.h.16.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
107 |
+
"transformer.h.16.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
108 |
+
"transformer.h.16.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
109 |
+
"transformer.h.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
110 |
+
"transformer.h.16.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
111 |
+
"transformer.h.16.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
112 |
+
"transformer.h.16.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
113 |
+
"transformer.h.16.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
114 |
+
"transformer.h.17.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
115 |
+
"transformer.h.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
116 |
+
"transformer.h.17.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
117 |
+
"transformer.h.17.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
118 |
+
"transformer.h.17.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
119 |
+
"transformer.h.17.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
120 |
+
"transformer.h.17.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
121 |
+
"transformer.h.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
122 |
+
"transformer.h.17.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
123 |
+
"transformer.h.17.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
124 |
+
"transformer.h.17.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
125 |
+
"transformer.h.17.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
126 |
+
"transformer.h.18.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
127 |
+
"transformer.h.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
128 |
+
"transformer.h.18.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
129 |
+
"transformer.h.18.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
130 |
+
"transformer.h.18.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
131 |
+
"transformer.h.18.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
132 |
+
"transformer.h.18.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
133 |
+
"transformer.h.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
134 |
+
"transformer.h.18.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
135 |
+
"transformer.h.18.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
136 |
+
"transformer.h.18.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
137 |
+
"transformer.h.18.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
138 |
+
"transformer.h.19.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
139 |
+
"transformer.h.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
140 |
+
"transformer.h.19.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
141 |
+
"transformer.h.19.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
142 |
+
"transformer.h.19.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
143 |
+
"transformer.h.19.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
144 |
+
"transformer.h.19.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
145 |
+
"transformer.h.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
146 |
+
"transformer.h.19.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
147 |
+
"transformer.h.19.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
148 |
+
"transformer.h.19.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
149 |
+
"transformer.h.19.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
150 |
+
"transformer.h.2.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
151 |
+
"transformer.h.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
152 |
+
"transformer.h.2.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
153 |
+
"transformer.h.2.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
154 |
+
"transformer.h.2.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
155 |
+
"transformer.h.2.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
156 |
+
"transformer.h.2.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
157 |
+
"transformer.h.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
158 |
+
"transformer.h.2.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
159 |
+
"transformer.h.2.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
160 |
+
"transformer.h.2.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
161 |
+
"transformer.h.2.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
162 |
+
"transformer.h.20.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
163 |
+
"transformer.h.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
164 |
+
"transformer.h.20.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
165 |
+
"transformer.h.20.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
166 |
+
"transformer.h.20.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
167 |
+
"transformer.h.20.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
168 |
+
"transformer.h.20.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
169 |
+
"transformer.h.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
170 |
+
"transformer.h.20.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
171 |
+
"transformer.h.20.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
172 |
+
"transformer.h.20.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
173 |
+
"transformer.h.20.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
174 |
+
"transformer.h.21.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
175 |
+
"transformer.h.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
176 |
+
"transformer.h.21.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
177 |
+
"transformer.h.21.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
178 |
+
"transformer.h.21.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
179 |
+
"transformer.h.21.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
180 |
+
"transformer.h.21.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
181 |
+
"transformer.h.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
182 |
+
"transformer.h.21.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
183 |
+
"transformer.h.21.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
184 |
+
"transformer.h.21.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
185 |
+
"transformer.h.21.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
186 |
+
"transformer.h.22.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
187 |
+
"transformer.h.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
188 |
+
"transformer.h.22.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
|
189 |
+
"transformer.h.22.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
|
190 |
+
"transformer.h.22.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
191 |
+
"transformer.h.22.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
192 |
+
"transformer.h.22.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
193 |
+
"transformer.h.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
194 |
+
"transformer.h.22.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
195 |
+
"transformer.h.22.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
196 |
+
"transformer.h.22.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
197 |
+
"transformer.h.22.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
198 |
+
"transformer.h.23.input_layernorm.bias": "model-00002-of-00003.safetensors",
|
199 |
+
"transformer.h.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
200 |
+
"transformer.h.23.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
201 |
+
"transformer.h.23.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
202 |
+
"transformer.h.23.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
|
203 |
+
"transformer.h.23.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
|
204 |
+
"transformer.h.23.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
|
205 |
+
"transformer.h.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
206 |
+
"transformer.h.23.self_attention.dense.bias": "model-00002-of-00003.safetensors",
|
207 |
+
"transformer.h.23.self_attention.dense.weight": "model-00002-of-00003.safetensors",
|
208 |
+
"transformer.h.23.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
|
209 |
+
"transformer.h.23.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
|
210 |
+
"transformer.h.24.input_layernorm.bias": "model-00003-of-00003.safetensors",
|
211 |
+
"transformer.h.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
212 |
+
"transformer.h.24.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
213 |
+
"transformer.h.24.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
214 |
+
"transformer.h.24.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
|
215 |
+
"transformer.h.24.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
|
216 |
+
"transformer.h.24.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
|
217 |
+
"transformer.h.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
218 |
+
"transformer.h.24.self_attention.dense.bias": "model-00003-of-00003.safetensors",
|
219 |
+
"transformer.h.24.self_attention.dense.weight": "model-00003-of-00003.safetensors",
|
220 |
+
"transformer.h.24.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
|
221 |
+
"transformer.h.24.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
|
222 |
+
"transformer.h.25.input_layernorm.bias": "model-00003-of-00003.safetensors",
|
223 |
+
"transformer.h.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
224 |
+
"transformer.h.25.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
225 |
+
"transformer.h.25.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
226 |
+
"transformer.h.25.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
|
227 |
+
"transformer.h.25.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
|
228 |
+
"transformer.h.25.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
|
229 |
+
"transformer.h.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
230 |
+
"transformer.h.25.self_attention.dense.bias": "model-00003-of-00003.safetensors",
|
231 |
+
"transformer.h.25.self_attention.dense.weight": "model-00003-of-00003.safetensors",
|
232 |
+
"transformer.h.25.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
|
233 |
+
"transformer.h.25.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
|
234 |
+
"transformer.h.26.input_layernorm.bias": "model-00003-of-00003.safetensors",
|
235 |
+
"transformer.h.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
236 |
+
"transformer.h.26.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
237 |
+
"transformer.h.26.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
238 |
+
"transformer.h.26.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
|
239 |
+
"transformer.h.26.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
|
240 |
+
"transformer.h.26.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
|
241 |
+
"transformer.h.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
242 |
+
"transformer.h.26.self_attention.dense.bias": "model-00003-of-00003.safetensors",
|
243 |
+
"transformer.h.26.self_attention.dense.weight": "model-00003-of-00003.safetensors",
|
244 |
+
"transformer.h.26.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
|
245 |
+
"transformer.h.26.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
|
246 |
+
"transformer.h.27.input_layernorm.bias": "model-00003-of-00003.safetensors",
|
247 |
+
"transformer.h.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
248 |
+
"transformer.h.27.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
249 |
+
"transformer.h.27.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
250 |
+
"transformer.h.27.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
|
251 |
+
"transformer.h.27.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
|
252 |
+
"transformer.h.27.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
|
253 |
+
"transformer.h.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
254 |
+
"transformer.h.27.self_attention.dense.bias": "model-00003-of-00003.safetensors",
|
255 |
+
"transformer.h.27.self_attention.dense.weight": "model-00003-of-00003.safetensors",
|
256 |
+
"transformer.h.27.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
|
257 |
+
"transformer.h.27.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
|
258 |
+
"transformer.h.28.input_layernorm.bias": "model-00003-of-00003.safetensors",
|
259 |
+
"transformer.h.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
260 |
+
"transformer.h.28.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
261 |
+
"transformer.h.28.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
262 |
+
"transformer.h.28.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
|
263 |
+
"transformer.h.28.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
|
264 |
+
"transformer.h.28.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
|
265 |
+
"transformer.h.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
266 |
+
"transformer.h.28.self_attention.dense.bias": "model-00003-of-00003.safetensors",
|
267 |
+
"transformer.h.28.self_attention.dense.weight": "model-00003-of-00003.safetensors",
|
268 |
+
"transformer.h.28.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
|
269 |
+
"transformer.h.28.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
|
270 |
+
"transformer.h.29.input_layernorm.bias": "model-00003-of-00003.safetensors",
|
271 |
+
"transformer.h.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
272 |
+
"transformer.h.29.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
|
273 |
+
"transformer.h.29.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
|
274 |
+
"transformer.h.29.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
|
275 |
+
"transformer.h.29.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
|
276 |
+
"transformer.h.29.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
|
277 |
+
"transformer.h.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
278 |
+
"transformer.h.29.self_attention.dense.bias": "model-00003-of-00003.safetensors",
|
279 |
+
"transformer.h.29.self_attention.dense.weight": "model-00003-of-00003.safetensors",
|
280 |
+
"transformer.h.29.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
|
281 |
+
"transformer.h.29.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
|
282 |
+
"transformer.h.3.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
283 |
+
"transformer.h.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
284 |
+
"transformer.h.3.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
285 |
+
"transformer.h.3.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
286 |
+
"transformer.h.3.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
287 |
+
"transformer.h.3.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
288 |
+
"transformer.h.3.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
289 |
+
"transformer.h.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
290 |
+
"transformer.h.3.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
291 |
+
"transformer.h.3.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
292 |
+
"transformer.h.3.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
293 |
+
"transformer.h.3.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
294 |
+
"transformer.h.4.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
295 |
+
"transformer.h.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
296 |
+
"transformer.h.4.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
297 |
+
"transformer.h.4.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
298 |
+
"transformer.h.4.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
299 |
+
"transformer.h.4.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
300 |
+
"transformer.h.4.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
301 |
+
"transformer.h.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
302 |
+
"transformer.h.4.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
303 |
+
"transformer.h.4.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
304 |
+
"transformer.h.4.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
305 |
+
"transformer.h.4.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
306 |
+
"transformer.h.5.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
307 |
+
"transformer.h.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
308 |
+
"transformer.h.5.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
309 |
+
"transformer.h.5.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
310 |
+
"transformer.h.5.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
311 |
+
"transformer.h.5.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
312 |
+
"transformer.h.5.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
313 |
+
"transformer.h.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
314 |
+
"transformer.h.5.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
315 |
+
"transformer.h.5.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
316 |
+
"transformer.h.5.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
317 |
+
"transformer.h.5.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
318 |
+
"transformer.h.6.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
319 |
+
"transformer.h.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
320 |
+
"transformer.h.6.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
321 |
+
"transformer.h.6.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
322 |
+
"transformer.h.6.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
323 |
+
"transformer.h.6.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
324 |
+
"transformer.h.6.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
325 |
+
"transformer.h.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
326 |
+
"transformer.h.6.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
327 |
+
"transformer.h.6.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
328 |
+
"transformer.h.6.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
329 |
+
"transformer.h.6.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
330 |
+
"transformer.h.7.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
331 |
+
"transformer.h.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
332 |
+
"transformer.h.7.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
333 |
+
"transformer.h.7.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
334 |
+
"transformer.h.7.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
335 |
+
"transformer.h.7.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
336 |
+
"transformer.h.7.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
337 |
+
"transformer.h.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
338 |
+
"transformer.h.7.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
339 |
+
"transformer.h.7.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
340 |
+
"transformer.h.7.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
341 |
+
"transformer.h.7.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
342 |
+
"transformer.h.8.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
343 |
+
"transformer.h.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
344 |
+
"transformer.h.8.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
345 |
+
"transformer.h.8.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
346 |
+
"transformer.h.8.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
347 |
+
"transformer.h.8.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
348 |
+
"transformer.h.8.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
349 |
+
"transformer.h.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
350 |
+
"transformer.h.8.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
351 |
+
"transformer.h.8.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
352 |
+
"transformer.h.8.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
353 |
+
"transformer.h.8.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
354 |
+
"transformer.h.9.input_layernorm.bias": "model-00001-of-00003.safetensors",
|
355 |
+
"transformer.h.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
356 |
+
"transformer.h.9.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
|
357 |
+
"transformer.h.9.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
|
358 |
+
"transformer.h.9.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
|
359 |
+
"transformer.h.9.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
|
360 |
+
"transformer.h.9.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
|
361 |
+
"transformer.h.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
362 |
+
"transformer.h.9.self_attention.dense.bias": "model-00001-of-00003.safetensors",
|
363 |
+
"transformer.h.9.self_attention.dense.weight": "model-00001-of-00003.safetensors",
|
364 |
+
"transformer.h.9.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
|
365 |
+
"transformer.h.9.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
|
366 |
+
"transformer.ln_f.bias": "model-00003-of-00003.safetensors",
|
367 |
+
"transformer.ln_f.weight": "model-00003-of-00003.safetensors",
|
368 |
+
"transformer.word_embeddings.weight": "model-00001-of-00003.safetensors",
|
369 |
+
"transformer.word_embeddings_layernorm.bias": "model-00001-of-00003.safetensors",
|
370 |
+
"transformer.word_embeddings_layernorm.weight": "model-00001-of-00003.safetensors"
|
371 |
+
}
|
372 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<|endoftext|>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": true,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "<|endoftext|>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": true,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": "<|endoftext|>",
|
17 |
+
"unk_token": {
|
18 |
+
"content": "<|endoftext|>",
|
19 |
+
"lstrip": false,
|
20 |
+
"normalized": true,
|
21 |
+
"rstrip": false,
|
22 |
+
"single_word": false
|
23 |
+
}
|
24 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": false,
|
3 |
+
"add_prefix_space": false,
|
4 |
+
"added_tokens_decoder": {
|
5 |
+
"50256": {
|
6 |
+
"content": "<|endoftext|>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": true,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false,
|
11 |
+
"special": true
|
12 |
+
}
|
13 |
+
},
|
14 |
+
"bos_token": "<|endoftext|>",
|
15 |
+
"clean_up_tokenization_spaces": true,
|
16 |
+
"eos_token": "<|endoftext|>",
|
17 |
+
"errors": "replace",
|
18 |
+
"model_max_length": 1000000000000000019884624838656,
|
19 |
+
"pad_token": "<|endoftext|>",
|
20 |
+
"tokenizer_class": "GPT2Tokenizer",
|
21 |
+
"unk_token": "<|endoftext|>"
|
22 |
+
}
|
vocab.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|