FabioDataGeek commited on
Commit
e35c725
·
verified ·
1 Parent(s): 910ec95

Upload 13 files

Browse files
README.md CHANGED
@@ -1,3 +1,188 @@
1
- ---
2
- license: cc-by-nc-nd-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - ca
5
+ - va
6
+ tags:
7
+ - FLOR
8
+ - bloom
9
+ - Aitana
10
+ - catalan
11
+ - valencian
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # AITANA-6.3B
16
+
17
+ ## Table of Contents
18
+ <details>
19
+ <summary>Click to expand</summary>
20
+
21
+ - [Model description](#model-description)
22
+ - [Intended uses and limitations](#intended-uses-and-limitations)
23
+ - [Demo](#demo)
24
+ - [How to use](#how-to-use)
25
+ - [Limitations and bias](#limitations-and-bias)
26
+ - [Training](#training)
27
+ - [Evaluation](#evaluation)
28
+ - [Additional information](#additional-information)
29
+
30
+ </details>
31
+
32
+ ## Model description
33
+
34
+ **AITANA-6.3B** is a text generation model for causal language modelling with a decoder-only architecture.
35
+ It has been trained from continuous pre-training based on [FLOR-6.3B](https://huggingface.co/projecte-aina/FLOR-6.3B), with emphasis on data (listed bellow)
36
+ in Valencian (similar to Catalan). Concretely, a total of 1.304 million tokens per epoch in this first version of the model and two epochs over the data.
37
+
38
+ This model is based on FLOR-6.3B as the basis for training and uses the same tokenizer.
39
+
40
+ ## Intended uses and limitations
41
+
42
+ As **FLOR-6.3B**, **AITANA-6.3B** is a base model that can be used for causal language modelling, it can be used as is for text generation,
43
+ although fine/instruction-tuning on specific tasks is recommended for its final use.
44
+
45
+ ## Demo
46
+
47
+ In the following link you can access an interactive demo to test the text generation in the language model:
48
+
49
+ Demo link(https://llm-aitana.gplsi.es/)
50
+
51
+ In the demo you can adjust the number of words generated as well as the decoding technique to be used by
52
+ the model (top p, top k) and other parameters such as temperature.
53
+
54
+ ## How to use
55
+ ```python
56
+ import torch
57
+ from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
58
+
59
+ input_text = "Les corts valencianes han pres la decisió de"
60
+
61
+ model_id = "gplsi/Aitana-6.3B"
62
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
63
+ generator = pipeline(
64
+ "text-generation",
65
+ model=model_id,
66
+ tokenizer=tokenizer,
67
+ torch_dtype=torch.bfloat16,
68
+ trust_remote_code=True,
69
+ device_map="auto",
70
+ )
71
+ generation = generator(
72
+ input_text,
73
+ do_sample=True,
74
+ top_k=10,
75
+ eos_token_id=tokenizer.eos_token_id,
76
+ )
77
+
78
+ print(f"Result: {generation[0]['generated_text']}")
79
+
80
+ ```
81
+
82
+ ## Training
83
+
84
+ ### Training data
85
+
86
+ The training corpus has been obtained using web scraping on public data from different sources such as the
87
+ [Official Gazette of the University of Alicante (BOUA)](https://www.boua.ua.es/ca), [the Official Gazette of the Generalitat Valenciana (DOGV)](https://dogv.gva.es/va) and accurated data provided by
88
+ [the Valencian Courts (DSCV and DSCCV)](https://www.cortsvalencianes.es/ca-va/). Giving a total of 1.304 million tokens, according to the following table.
89
+
90
+ Dataset | Language | Words (per-epoch) | Epochs | Total Tokens |
91
+ |---------------------|----------|--------------------|--------------|--------------|
92
+ DSCV | va | 31.98M | 2 | 57.05M |
93
+ DSCCV | va | 45.59M | 2 | 80.91M |
94
+ BOUA | va | 11.65M | 2 | 29.02M |
95
+ DOGV | va | 301.59M | 2 | 982.33M |
96
+ DOGCV | va | 54.92M | 2 | 154.32M |
97
+
98
+ Several of the downloaded sources have already been used in the FLOR-6.3B training, so the date of data collection for the previous
99
+ model has been taken into account and those web pages have been scraped from that date.
100
+
101
+ Information on the datasets used for training is shown below:
102
+
103
+ - BOUA: Official Bulletin of the University of Alicante. In this case we are dealing with documents issued by the University of Alicante in Valencian about grants, calls issued by the university, regulations, resolutions of laws that affect the university environment and corrections of errors of these same documents issued previously.
104
+
105
+ - DOGV: Official Journal of the Generalitat Valenciana. This dataset contains official communiqués of different kinds issued by the Generalitat Valenciana, with data entirely in Valencian. It mainly talks about measures taken in the legal field, approval of laws and public sector communiqués. In this case, we have 18 different documents covering communiqués from 1998 to 2018 and three more recent documents with data from 2019 to 2023.
106
+
107
+ - DOGCV: in this case it is the Official Journal of the Generalitat Valenciana, but only the historical documents from 1980 to 1997.
108
+
109
+ - DSCV: Journal of the Valencian Parliament. This dataset contains transcriptions of the different interventions made during the plenary sessions in the Valencian Parliament by the different participants. It covers data from 2001 to 1999 up to 2022, each transcript comprises an .html file.
110
+
111
+ - DSCCV: this is a dataset of the Valencian Parliament diary, centred on transcriptions of the different commissions held. As in the previous case, it is separated into one file for each transcription.
112
+
113
+
114
+ ### Training parameters
115
+
116
+ During the training of the model, a high context window was desired when generating text, so it was decided to use an input size of 2048
117
+ tokens and a minimum context window of 512 in case of truncating the input sequences. 80% of the data obtained was used for training stage,
118
+ while 20% was used during the evaluation stage. A summary of the parameters used during training can be seen in the following table:
119
+
120
+ Parameter | Value |
121
+ |---------------------|---|
122
+ Epochs | 1 |
123
+ Learning Rate | 2e-5 |
124
+ Warmup Steps | 0 |
125
+ Precission | bf-16 |
126
+ Weight decay | 1e-1 |
127
+ Training Fraction | 0.8 |
128
+ Evaluation Fraction | 0.2 |
129
+ Input size (tokens) | 2048 |
130
+ Minimum context window (tokens) | 512 |
131
+ Training time (hours/epoch) | 40 |
132
+
133
+ ### Devices
134
+
135
+ A total of 4 A100 graphics cards with a maximum capacity of 40 GB each were used to train the model. This meant a training time of approximately
136
+ 40 hours per epoch. Using a mini batch size of size 2 and a batch size of size 32 to calculate back propagation.
137
+
138
+ ### Distributed Training Strategy
139
+
140
+ A distributed training strategy called Fully Sharded Data Parallel ([FSDP](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html))
141
+ has been used. With this, the entire model has been loaded among the 4 A100s available for training with a mini-batch size of size 2 as
142
+ previously discussed.
143
+
144
+ ### Languages
145
+
146
+ In addition to the data already used for the training of FLOR-6.3B, data completely in Valencian from the sources mentioned in
147
+ the previous section have been used.
148
+
149
+
150
+ ## Evaluation
151
+ The model has been evaluated using the loss function and perplexity during the training stage and these metrics have also been
152
+ obtained during the evaluation stage. Due to the low amount of data, it was decided to perform the evaluation at the end of each epoch.
153
+
154
+ Loss and perplexity for train and evaluation
155
+
156
+ ### Results
157
+
158
+ Future benchmarks here.
159
+
160
+ ## Additional information
161
+
162
+ ### Author
163
+
164
+ GPLSI (https://gplsi.dlsi.ua.es/)
165
+
166
+ ### Contact
167
+
168
+ GPLSI (https://gplsi.dlsi.ua.es/)
169
+
170
+
171
+ ### Copyright
172
+
173
+ GPLSI (https://gplsi.dlsi.ua.es/)
174
+
175
+ ### License
176
+
177
+ Attribution-NonCommercial-ShareAlike 4.0 International
178
+
179
+ This model is free to use for personal and research use. However a commercial license is required for commerical applications.
180
+
181
+ ### Funding
182
+
183
+ ILENIA-VIVES project <<2022/TL22/00215334>>
184
+
185
+ ### Disclaimer
186
+
187
+ The GPLSI research group is not responsible for the inappropriate use of this language model, as well as any bias, toxic language
188
+ or sensitive information that the model may contain. This language model has been developed for research purposes.
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "projecte-aina/FLOR-6.3B",
3
+ "apply_residual_connection_post_layernorm": false,
4
+ "architectures": [
5
+ "BloomForCausalLM"
6
+ ],
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_dropout": 0.0,
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_epsilon": 1e-05,
14
+ "model_type": "bloom",
15
+ "n_head": 32,
16
+ "n_layer": 30,
17
+ "pretraining_tp": 1,
18
+ "slow_but_exact": false,
19
+ "torch_dtype": "bfloat16",
20
+ "transformers_version": "4.40.0",
21
+ "use_cache": true,
22
+ "vocab_size": 50257
23
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.40.0"
6
+ }
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:368aa6579a3552ce560bb3e5c64da222939d5c0e0e36302db66a219c3db710dd
3
+ size 4976378400
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88c0dc637f27a9618a6764c73c113697c9263e21b59b9e78e3fd8093398285a0
3
+ size 4967384080
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fef39e658d186c0d01dfcf434359bb567c54da7a969c8a8915033ac7ea416a02
3
+ size 2550809384
model.safetensors.index.json ADDED
@@ -0,0 +1,372 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 12494528512
4
+ },
5
+ "weight_map": {
6
+ "transformer.h.0.input_layernorm.bias": "model-00001-of-00003.safetensors",
7
+ "transformer.h.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
8
+ "transformer.h.0.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
9
+ "transformer.h.0.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
10
+ "transformer.h.0.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
11
+ "transformer.h.0.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
12
+ "transformer.h.0.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
13
+ "transformer.h.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
14
+ "transformer.h.0.self_attention.dense.bias": "model-00001-of-00003.safetensors",
15
+ "transformer.h.0.self_attention.dense.weight": "model-00001-of-00003.safetensors",
16
+ "transformer.h.0.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
17
+ "transformer.h.0.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
18
+ "transformer.h.1.input_layernorm.bias": "model-00001-of-00003.safetensors",
19
+ "transformer.h.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
20
+ "transformer.h.1.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
21
+ "transformer.h.1.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
22
+ "transformer.h.1.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
23
+ "transformer.h.1.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
24
+ "transformer.h.1.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
25
+ "transformer.h.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
26
+ "transformer.h.1.self_attention.dense.bias": "model-00001-of-00003.safetensors",
27
+ "transformer.h.1.self_attention.dense.weight": "model-00001-of-00003.safetensors",
28
+ "transformer.h.1.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
29
+ "transformer.h.1.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
30
+ "transformer.h.10.input_layernorm.bias": "model-00001-of-00003.safetensors",
31
+ "transformer.h.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
32
+ "transformer.h.10.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
33
+ "transformer.h.10.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
34
+ "transformer.h.10.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
35
+ "transformer.h.10.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
36
+ "transformer.h.10.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
37
+ "transformer.h.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
38
+ "transformer.h.10.self_attention.dense.bias": "model-00001-of-00003.safetensors",
39
+ "transformer.h.10.self_attention.dense.weight": "model-00001-of-00003.safetensors",
40
+ "transformer.h.10.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
41
+ "transformer.h.10.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
42
+ "transformer.h.11.input_layernorm.bias": "model-00001-of-00003.safetensors",
43
+ "transformer.h.11.input_layernorm.weight": "model-00001-of-00003.safetensors",
44
+ "transformer.h.11.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
45
+ "transformer.h.11.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
46
+ "transformer.h.11.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
47
+ "transformer.h.11.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
48
+ "transformer.h.11.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
49
+ "transformer.h.11.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
50
+ "transformer.h.11.self_attention.dense.bias": "model-00001-of-00003.safetensors",
51
+ "transformer.h.11.self_attention.dense.weight": "model-00001-of-00003.safetensors",
52
+ "transformer.h.11.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
53
+ "transformer.h.11.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
54
+ "transformer.h.12.input_layernorm.bias": "model-00002-of-00003.safetensors",
55
+ "transformer.h.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
56
+ "transformer.h.12.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
57
+ "transformer.h.12.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
58
+ "transformer.h.12.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
59
+ "transformer.h.12.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
60
+ "transformer.h.12.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
61
+ "transformer.h.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
62
+ "transformer.h.12.self_attention.dense.bias": "model-00002-of-00003.safetensors",
63
+ "transformer.h.12.self_attention.dense.weight": "model-00002-of-00003.safetensors",
64
+ "transformer.h.12.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
65
+ "transformer.h.12.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
66
+ "transformer.h.13.input_layernorm.bias": "model-00002-of-00003.safetensors",
67
+ "transformer.h.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
68
+ "transformer.h.13.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
69
+ "transformer.h.13.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
70
+ "transformer.h.13.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
71
+ "transformer.h.13.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
72
+ "transformer.h.13.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
73
+ "transformer.h.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
74
+ "transformer.h.13.self_attention.dense.bias": "model-00002-of-00003.safetensors",
75
+ "transformer.h.13.self_attention.dense.weight": "model-00002-of-00003.safetensors",
76
+ "transformer.h.13.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
77
+ "transformer.h.13.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
78
+ "transformer.h.14.input_layernorm.bias": "model-00002-of-00003.safetensors",
79
+ "transformer.h.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
80
+ "transformer.h.14.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
81
+ "transformer.h.14.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
82
+ "transformer.h.14.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
83
+ "transformer.h.14.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
84
+ "transformer.h.14.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
85
+ "transformer.h.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
86
+ "transformer.h.14.self_attention.dense.bias": "model-00002-of-00003.safetensors",
87
+ "transformer.h.14.self_attention.dense.weight": "model-00002-of-00003.safetensors",
88
+ "transformer.h.14.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
89
+ "transformer.h.14.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
90
+ "transformer.h.15.input_layernorm.bias": "model-00002-of-00003.safetensors",
91
+ "transformer.h.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
92
+ "transformer.h.15.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
93
+ "transformer.h.15.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
94
+ "transformer.h.15.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
95
+ "transformer.h.15.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
96
+ "transformer.h.15.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
97
+ "transformer.h.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
98
+ "transformer.h.15.self_attention.dense.bias": "model-00002-of-00003.safetensors",
99
+ "transformer.h.15.self_attention.dense.weight": "model-00002-of-00003.safetensors",
100
+ "transformer.h.15.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
101
+ "transformer.h.15.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
102
+ "transformer.h.16.input_layernorm.bias": "model-00002-of-00003.safetensors",
103
+ "transformer.h.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
104
+ "transformer.h.16.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
105
+ "transformer.h.16.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
106
+ "transformer.h.16.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
107
+ "transformer.h.16.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
108
+ "transformer.h.16.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
109
+ "transformer.h.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
110
+ "transformer.h.16.self_attention.dense.bias": "model-00002-of-00003.safetensors",
111
+ "transformer.h.16.self_attention.dense.weight": "model-00002-of-00003.safetensors",
112
+ "transformer.h.16.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
113
+ "transformer.h.16.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
114
+ "transformer.h.17.input_layernorm.bias": "model-00002-of-00003.safetensors",
115
+ "transformer.h.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
116
+ "transformer.h.17.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
117
+ "transformer.h.17.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
118
+ "transformer.h.17.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
119
+ "transformer.h.17.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
120
+ "transformer.h.17.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
121
+ "transformer.h.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
122
+ "transformer.h.17.self_attention.dense.bias": "model-00002-of-00003.safetensors",
123
+ "transformer.h.17.self_attention.dense.weight": "model-00002-of-00003.safetensors",
124
+ "transformer.h.17.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
125
+ "transformer.h.17.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
126
+ "transformer.h.18.input_layernorm.bias": "model-00002-of-00003.safetensors",
127
+ "transformer.h.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
128
+ "transformer.h.18.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
129
+ "transformer.h.18.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
130
+ "transformer.h.18.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
131
+ "transformer.h.18.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
132
+ "transformer.h.18.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
133
+ "transformer.h.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
134
+ "transformer.h.18.self_attention.dense.bias": "model-00002-of-00003.safetensors",
135
+ "transformer.h.18.self_attention.dense.weight": "model-00002-of-00003.safetensors",
136
+ "transformer.h.18.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
137
+ "transformer.h.18.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
138
+ "transformer.h.19.input_layernorm.bias": "model-00002-of-00003.safetensors",
139
+ "transformer.h.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
140
+ "transformer.h.19.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
141
+ "transformer.h.19.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
142
+ "transformer.h.19.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
143
+ "transformer.h.19.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
144
+ "transformer.h.19.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
145
+ "transformer.h.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
146
+ "transformer.h.19.self_attention.dense.bias": "model-00002-of-00003.safetensors",
147
+ "transformer.h.19.self_attention.dense.weight": "model-00002-of-00003.safetensors",
148
+ "transformer.h.19.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
149
+ "transformer.h.19.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
150
+ "transformer.h.2.input_layernorm.bias": "model-00001-of-00003.safetensors",
151
+ "transformer.h.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
152
+ "transformer.h.2.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
153
+ "transformer.h.2.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
154
+ "transformer.h.2.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
155
+ "transformer.h.2.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
156
+ "transformer.h.2.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
157
+ "transformer.h.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
158
+ "transformer.h.2.self_attention.dense.bias": "model-00001-of-00003.safetensors",
159
+ "transformer.h.2.self_attention.dense.weight": "model-00001-of-00003.safetensors",
160
+ "transformer.h.2.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
161
+ "transformer.h.2.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
162
+ "transformer.h.20.input_layernorm.bias": "model-00002-of-00003.safetensors",
163
+ "transformer.h.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
164
+ "transformer.h.20.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
165
+ "transformer.h.20.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
166
+ "transformer.h.20.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
167
+ "transformer.h.20.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
168
+ "transformer.h.20.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
169
+ "transformer.h.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
170
+ "transformer.h.20.self_attention.dense.bias": "model-00002-of-00003.safetensors",
171
+ "transformer.h.20.self_attention.dense.weight": "model-00002-of-00003.safetensors",
172
+ "transformer.h.20.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
173
+ "transformer.h.20.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
174
+ "transformer.h.21.input_layernorm.bias": "model-00002-of-00003.safetensors",
175
+ "transformer.h.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
176
+ "transformer.h.21.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
177
+ "transformer.h.21.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
178
+ "transformer.h.21.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
179
+ "transformer.h.21.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
180
+ "transformer.h.21.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
181
+ "transformer.h.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
182
+ "transformer.h.21.self_attention.dense.bias": "model-00002-of-00003.safetensors",
183
+ "transformer.h.21.self_attention.dense.weight": "model-00002-of-00003.safetensors",
184
+ "transformer.h.21.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
185
+ "transformer.h.21.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
186
+ "transformer.h.22.input_layernorm.bias": "model-00002-of-00003.safetensors",
187
+ "transformer.h.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
188
+ "transformer.h.22.mlp.dense_4h_to_h.bias": "model-00002-of-00003.safetensors",
189
+ "transformer.h.22.mlp.dense_4h_to_h.weight": "model-00002-of-00003.safetensors",
190
+ "transformer.h.22.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
191
+ "transformer.h.22.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
192
+ "transformer.h.22.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
193
+ "transformer.h.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
194
+ "transformer.h.22.self_attention.dense.bias": "model-00002-of-00003.safetensors",
195
+ "transformer.h.22.self_attention.dense.weight": "model-00002-of-00003.safetensors",
196
+ "transformer.h.22.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
197
+ "transformer.h.22.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
198
+ "transformer.h.23.input_layernorm.bias": "model-00002-of-00003.safetensors",
199
+ "transformer.h.23.input_layernorm.weight": "model-00002-of-00003.safetensors",
200
+ "transformer.h.23.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
201
+ "transformer.h.23.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
202
+ "transformer.h.23.mlp.dense_h_to_4h.bias": "model-00002-of-00003.safetensors",
203
+ "transformer.h.23.mlp.dense_h_to_4h.weight": "model-00002-of-00003.safetensors",
204
+ "transformer.h.23.post_attention_layernorm.bias": "model-00002-of-00003.safetensors",
205
+ "transformer.h.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
206
+ "transformer.h.23.self_attention.dense.bias": "model-00002-of-00003.safetensors",
207
+ "transformer.h.23.self_attention.dense.weight": "model-00002-of-00003.safetensors",
208
+ "transformer.h.23.self_attention.query_key_value.bias": "model-00002-of-00003.safetensors",
209
+ "transformer.h.23.self_attention.query_key_value.weight": "model-00002-of-00003.safetensors",
210
+ "transformer.h.24.input_layernorm.bias": "model-00003-of-00003.safetensors",
211
+ "transformer.h.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
212
+ "transformer.h.24.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
213
+ "transformer.h.24.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
214
+ "transformer.h.24.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
215
+ "transformer.h.24.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
216
+ "transformer.h.24.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
217
+ "transformer.h.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
218
+ "transformer.h.24.self_attention.dense.bias": "model-00003-of-00003.safetensors",
219
+ "transformer.h.24.self_attention.dense.weight": "model-00003-of-00003.safetensors",
220
+ "transformer.h.24.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
221
+ "transformer.h.24.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
222
+ "transformer.h.25.input_layernorm.bias": "model-00003-of-00003.safetensors",
223
+ "transformer.h.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
224
+ "transformer.h.25.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
225
+ "transformer.h.25.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
226
+ "transformer.h.25.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
227
+ "transformer.h.25.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
228
+ "transformer.h.25.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
229
+ "transformer.h.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
230
+ "transformer.h.25.self_attention.dense.bias": "model-00003-of-00003.safetensors",
231
+ "transformer.h.25.self_attention.dense.weight": "model-00003-of-00003.safetensors",
232
+ "transformer.h.25.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
233
+ "transformer.h.25.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
234
+ "transformer.h.26.input_layernorm.bias": "model-00003-of-00003.safetensors",
235
+ "transformer.h.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
236
+ "transformer.h.26.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
237
+ "transformer.h.26.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
238
+ "transformer.h.26.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
239
+ "transformer.h.26.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
240
+ "transformer.h.26.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
241
+ "transformer.h.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
242
+ "transformer.h.26.self_attention.dense.bias": "model-00003-of-00003.safetensors",
243
+ "transformer.h.26.self_attention.dense.weight": "model-00003-of-00003.safetensors",
244
+ "transformer.h.26.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
245
+ "transformer.h.26.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
246
+ "transformer.h.27.input_layernorm.bias": "model-00003-of-00003.safetensors",
247
+ "transformer.h.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
248
+ "transformer.h.27.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
249
+ "transformer.h.27.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
250
+ "transformer.h.27.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
251
+ "transformer.h.27.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
252
+ "transformer.h.27.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
253
+ "transformer.h.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
254
+ "transformer.h.27.self_attention.dense.bias": "model-00003-of-00003.safetensors",
255
+ "transformer.h.27.self_attention.dense.weight": "model-00003-of-00003.safetensors",
256
+ "transformer.h.27.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
257
+ "transformer.h.27.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
258
+ "transformer.h.28.input_layernorm.bias": "model-00003-of-00003.safetensors",
259
+ "transformer.h.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
260
+ "transformer.h.28.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
261
+ "transformer.h.28.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
262
+ "transformer.h.28.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
263
+ "transformer.h.28.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
264
+ "transformer.h.28.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
265
+ "transformer.h.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
266
+ "transformer.h.28.self_attention.dense.bias": "model-00003-of-00003.safetensors",
267
+ "transformer.h.28.self_attention.dense.weight": "model-00003-of-00003.safetensors",
268
+ "transformer.h.28.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
269
+ "transformer.h.28.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
270
+ "transformer.h.29.input_layernorm.bias": "model-00003-of-00003.safetensors",
271
+ "transformer.h.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
272
+ "transformer.h.29.mlp.dense_4h_to_h.bias": "model-00003-of-00003.safetensors",
273
+ "transformer.h.29.mlp.dense_4h_to_h.weight": "model-00003-of-00003.safetensors",
274
+ "transformer.h.29.mlp.dense_h_to_4h.bias": "model-00003-of-00003.safetensors",
275
+ "transformer.h.29.mlp.dense_h_to_4h.weight": "model-00003-of-00003.safetensors",
276
+ "transformer.h.29.post_attention_layernorm.bias": "model-00003-of-00003.safetensors",
277
+ "transformer.h.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
278
+ "transformer.h.29.self_attention.dense.bias": "model-00003-of-00003.safetensors",
279
+ "transformer.h.29.self_attention.dense.weight": "model-00003-of-00003.safetensors",
280
+ "transformer.h.29.self_attention.query_key_value.bias": "model-00003-of-00003.safetensors",
281
+ "transformer.h.29.self_attention.query_key_value.weight": "model-00003-of-00003.safetensors",
282
+ "transformer.h.3.input_layernorm.bias": "model-00001-of-00003.safetensors",
283
+ "transformer.h.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
284
+ "transformer.h.3.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
285
+ "transformer.h.3.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
286
+ "transformer.h.3.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
287
+ "transformer.h.3.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
288
+ "transformer.h.3.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
289
+ "transformer.h.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
290
+ "transformer.h.3.self_attention.dense.bias": "model-00001-of-00003.safetensors",
291
+ "transformer.h.3.self_attention.dense.weight": "model-00001-of-00003.safetensors",
292
+ "transformer.h.3.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
293
+ "transformer.h.3.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
294
+ "transformer.h.4.input_layernorm.bias": "model-00001-of-00003.safetensors",
295
+ "transformer.h.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
296
+ "transformer.h.4.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
297
+ "transformer.h.4.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
298
+ "transformer.h.4.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
299
+ "transformer.h.4.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
300
+ "transformer.h.4.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
301
+ "transformer.h.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
302
+ "transformer.h.4.self_attention.dense.bias": "model-00001-of-00003.safetensors",
303
+ "transformer.h.4.self_attention.dense.weight": "model-00001-of-00003.safetensors",
304
+ "transformer.h.4.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
305
+ "transformer.h.4.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
306
+ "transformer.h.5.input_layernorm.bias": "model-00001-of-00003.safetensors",
307
+ "transformer.h.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
308
+ "transformer.h.5.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
309
+ "transformer.h.5.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
310
+ "transformer.h.5.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
311
+ "transformer.h.5.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
312
+ "transformer.h.5.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
313
+ "transformer.h.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
314
+ "transformer.h.5.self_attention.dense.bias": "model-00001-of-00003.safetensors",
315
+ "transformer.h.5.self_attention.dense.weight": "model-00001-of-00003.safetensors",
316
+ "transformer.h.5.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
317
+ "transformer.h.5.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
318
+ "transformer.h.6.input_layernorm.bias": "model-00001-of-00003.safetensors",
319
+ "transformer.h.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
320
+ "transformer.h.6.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
321
+ "transformer.h.6.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
322
+ "transformer.h.6.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
323
+ "transformer.h.6.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
324
+ "transformer.h.6.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
325
+ "transformer.h.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
326
+ "transformer.h.6.self_attention.dense.bias": "model-00001-of-00003.safetensors",
327
+ "transformer.h.6.self_attention.dense.weight": "model-00001-of-00003.safetensors",
328
+ "transformer.h.6.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
329
+ "transformer.h.6.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
330
+ "transformer.h.7.input_layernorm.bias": "model-00001-of-00003.safetensors",
331
+ "transformer.h.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
332
+ "transformer.h.7.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
333
+ "transformer.h.7.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
334
+ "transformer.h.7.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
335
+ "transformer.h.7.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
336
+ "transformer.h.7.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
337
+ "transformer.h.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
338
+ "transformer.h.7.self_attention.dense.bias": "model-00001-of-00003.safetensors",
339
+ "transformer.h.7.self_attention.dense.weight": "model-00001-of-00003.safetensors",
340
+ "transformer.h.7.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
341
+ "transformer.h.7.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
342
+ "transformer.h.8.input_layernorm.bias": "model-00001-of-00003.safetensors",
343
+ "transformer.h.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
344
+ "transformer.h.8.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
345
+ "transformer.h.8.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
346
+ "transformer.h.8.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
347
+ "transformer.h.8.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
348
+ "transformer.h.8.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
349
+ "transformer.h.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
350
+ "transformer.h.8.self_attention.dense.bias": "model-00001-of-00003.safetensors",
351
+ "transformer.h.8.self_attention.dense.weight": "model-00001-of-00003.safetensors",
352
+ "transformer.h.8.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
353
+ "transformer.h.8.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
354
+ "transformer.h.9.input_layernorm.bias": "model-00001-of-00003.safetensors",
355
+ "transformer.h.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
356
+ "transformer.h.9.mlp.dense_4h_to_h.bias": "model-00001-of-00003.safetensors",
357
+ "transformer.h.9.mlp.dense_4h_to_h.weight": "model-00001-of-00003.safetensors",
358
+ "transformer.h.9.mlp.dense_h_to_4h.bias": "model-00001-of-00003.safetensors",
359
+ "transformer.h.9.mlp.dense_h_to_4h.weight": "model-00001-of-00003.safetensors",
360
+ "transformer.h.9.post_attention_layernorm.bias": "model-00001-of-00003.safetensors",
361
+ "transformer.h.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
362
+ "transformer.h.9.self_attention.dense.bias": "model-00001-of-00003.safetensors",
363
+ "transformer.h.9.self_attention.dense.weight": "model-00001-of-00003.safetensors",
364
+ "transformer.h.9.self_attention.query_key_value.bias": "model-00001-of-00003.safetensors",
365
+ "transformer.h.9.self_attention.query_key_value.weight": "model-00001-of-00003.safetensors",
366
+ "transformer.ln_f.bias": "model-00003-of-00003.safetensors",
367
+ "transformer.ln_f.weight": "model-00003-of-00003.safetensors",
368
+ "transformer.word_embeddings.weight": "model-00001-of-00003.safetensors",
369
+ "transformer.word_embeddings_layernorm.bias": "model-00001-of-00003.safetensors",
370
+ "transformer.word_embeddings_layernorm.weight": "model-00001-of-00003.safetensors"
371
+ }
372
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "50256": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ }
13
+ },
14
+ "bos_token": "<|endoftext|>",
15
+ "clean_up_tokenization_spaces": true,
16
+ "eos_token": "<|endoftext|>",
17
+ "errors": "replace",
18
+ "model_max_length": 1000000000000000019884624838656,
19
+ "pad_token": "<|endoftext|>",
20
+ "tokenizer_class": "GPT2Tokenizer",
21
+ "unk_token": "<|endoftext|>"
22
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff