File size: 1,950 Bytes
85e238e
 
520a92d
 
 
 
 
 
 
 
 
 
85e238e
520a92d
85e238e
520a92d
 
 
 
85e238e
520a92d
85e238e
520a92d
85e238e
520a92d
85e238e
520a92d
85e238e
520a92d
85e238e
520a92d
85e238e
520a92d
85e238e
520a92d
 
 
85e238e
520a92d
 
 
 
 
 
 
85e238e
520a92d
 
 
 
 
 
85e238e
520a92d
 
 
 
 
 
 
 
 
 
 
85e238e
520a92d
85e238e
520a92d
 
85e238e
520a92d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
library_name: transformers
tags:
- lucie
- lucie-boosted
- llama
license: apache-2.0
datasets:
- jpacifico/french-orca-dpo-pairs-revised
language:
- fr
- en
---
### Lucie-Boosted-7B-Instruct

Post-training optimization of the foundation model [OpenLLM-France/Lucie-7B-Instruct](https://huggingface.co/OpenLLM-France/Lucie-7B-Instruct)  
DPO fine-tuning using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) RLHF dataset.  
Training in French also enhances the model's overall performance.  
*Lucie-7B has a context size of 32K tokens*  

### OpenLLM Leaderboard

coming soon  

### MT-Bench

coming soon

### Usage

You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_14B_inference_test_colab.ipynb) 

You can also run Chocolatine using the following code:

```python
import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])
```

### Limitations

The Lucie-Boosted model is a quick demonstration that the Lucie foundation model can be easily fine-tuned to achieve compelling performance.  
It does not have any moderation mechanism.  

- **Developed by:** Jonathan Pacifico, 2025  
- **Model type:** LLM 
- **Language(s) (NLP):** French, English  
- **License:** Apache-2.0