File size: 3,281 Bytes
60d8368
b65012e
 
 
60d8368
 
b65012e
60d8368
b65012e
590066c
b65012e
60d8368
b65012e
60d8368
b65012e
60d8368
b65012e
 
 
 
 
 
 
 
60d8368
 
b65012e
 
 
60d8368
aab2567
60d8368
 
 
 
 
 
847e1c6
60d8368
9f09426
 
590066c
b65012e
590066c
b65012e
590066c
b65012e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
base_model:
- tiiuae/falcon-11B
library_name: transformers
tags:
- mergekit
- merge
- lazymergekit
license: apache-2.0
language:
- de
---
# sliced

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details
### Merge Method

This model was merged using the passthrough merge method.

### Models Merged

The following models were included in the merge:
* [tiiuae/falcon-11B](https://huggingface.co/tiiuae/falcon-11B)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
slices:
  - sources:
      - model: tiiuae/falcon-11B
        layer_range: [0, 25]
  - sources:
      - model: tiiuae/falcon-11B
        layer_range: [56, 59]
merge_method: passthrough
dtype: bfloat16
```

[PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the wikimedia/wikipedia Dutch (nl) subset by investigating layer similarity with 2000 samples. The layer ranges for pruning were determined based on this analysis to maintain performance while reducing model size.

![Layer Similarity Plot](https://cdn-uploads.huggingface.co/production/uploads/660c0a02cf274b3ab77dd6b7/PF3SzEhQRJPXyYi2KqS1A.png)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "ssmits/Falcon2-5.5B-Dutch"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
)
sequences = pipeline(
   "Can you explain the concepts of Quantum Computing?",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

```

💥 **Falcon LLMs require PyTorch 2.0 for use with `transformers`!**

For fast inference with Falcon, check-out [Text Generation Inference](https://github.com/huggingface/text-generation-inference)! Read more in this [blogpost]((https://huggingface.co/blog/falcon). 

## Direct Use
Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)

## Out-of-Scope Use
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

## Bias, Risks, and Limitations
Falcon2-5.5B is trained mostly on English, but also German, Spanish, French, Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

## Recommendations
We recommend users of Falcon2-5.5B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.

[PruneMe](https://github.com/arcee-ai/PruneMe) has been utilized using the AgentWaller/dutch-oasst1 dataset by investigating layer similarity with 4000 samples. The layer ranges for pruning were determined based on this analysis to maintain performance while reducing model size.