ssmits commited on
Commit
b65012e
·
verified ·
1 Parent(s): 847e1c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -12
README.md CHANGED
@@ -1,21 +1,32 @@
1
  ---
2
- license: apache-2.0
 
 
3
  tags:
4
- - merge
5
  - mergekit
 
6
  - lazymergekit
7
- - tiiuae/falcon-11B
8
  language:
9
- - nl
10
  ---
 
11
 
12
- # Falcon-5.5B-Dutch
13
 
14
- Falcon-5.5B-Dutch is a pruned version of Falcon-11B using [mergekit](https://github.com/cg123/mergekit):
15
- * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
 
 
 
 
 
 
16
  * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
17
 
18
- ## 🧩 Configuration
 
 
19
 
20
  ```yaml
21
  slices:
@@ -25,13 +36,55 @@ slices:
25
  - sources:
26
  - model: tiiuae/falcon-11B
27
  layer_range: [56, 59]
28
-
29
  merge_method: passthrough
30
  dtype: bfloat16
31
  ```
32
 
33
- [PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the AgentWaller/dutch-oasst1 dataset by investigating layer similarity with 4000 samples. The layer ranges for pruning were determined based on this analysis to maintain performance while reducing model size.
34
 
35
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/PF3SzEhQRJPXyYi2KqS1A.png)
36
 
37
- Note: This is a base language model and has not been optimized for conversational or chat applications. Further fine-tuning may be required to adapt it for specific use cases.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - tiiuae/falcon-11B
4
+ library_name: transformers
5
  tags:
 
6
  - mergekit
7
+ - merge
8
  - lazymergekit
9
+ license: apache-2.0
10
  language:
11
+ - de
12
  ---
13
+ # sliced
14
 
15
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
16
 
17
+ ## Merge Details
18
+ ### Merge Method
19
+
20
+ This model was merged using the passthrough merge method.
21
+
22
+ ### Models Merged
23
+
24
+ The following models were included in the merge:
25
  * [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)
26
 
27
+ ### Configuration
28
+
29
+ The following YAML configuration was used to produce this model:
30
 
31
  ```yaml
32
  slices:
 
36
  - sources:
37
  - model: tiiuae/falcon-11B
38
  layer_range: [56, 59]
 
39
  merge_method: passthrough
40
  dtype: bfloat16
41
  ```
42
 
43
+ [PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the wikimedia/wikipedia Dutch (nl) subset by investigating layer similarity with 2000 samples. The layer ranges for pruning were determined based on this analysis to maintain performance while reducing model size.
44
 
45
+ ![Layer Similarity Plot](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/PF3SzEhQRJPXyYi2KqS1A.png)
46
 
47
+ ```python
48
+ from transformers import AutoTokenizer, AutoModelForCausalLM
49
+ import transformers
50
+ import torch
51
+
52
+ model = "ssmits/Falcon2-5.5B-Dutch"
53
+
54
+ tokenizer = AutoTokenizer.from_pretrained(model)
55
+ pipeline = transformers.pipeline(
56
+ "text-generation",
57
+ model=model,
58
+ tokenizer=tokenizer,
59
+ torch_dtype=torch.bfloat16,
60
+ )
61
+ sequences = pipeline(
62
+ "Can you explain the concepts of Quantum Computing?",
63
+ max_length=200,
64
+ do_sample=True,
65
+ top_k=10,
66
+ num_return_sequences=1,
67
+ eos_token_id=tokenizer.eos_token_id,
68
+ )
69
+ for seq in sequences:
70
+ print(f"Result: {seq['generated_text']}")
71
+
72
+ ```
73
+
74
+ 💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**
75
+
76
+ For fast inference with Falcon, check-out [Text Generation Inference](https://github.com/huggingface/text-generation-inference)! Read more in this [blogpost]((https://huggingface.co/blog/falcon).
77
+
78
+ ## Direct Use
79
+ Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)
80
+
81
+ ## Out-of-Scope Use
82
+ Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
83
+
84
+ ## Bias, Risks, and Limitations
85
+ Falcon2-5.5B is trained mostly on English, but also German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
86
+
87
+ ## Recommendations
88
+ We recommend users of Falcon2-5.5B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
89
+
90
+ [PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the AgentWaller/dutch-oasst1 dataset by investigating layer similarity with 4000 samples. The layer ranges for pruning were determined based on this analysis to maintain performance while reducing model size.