File size: 3,700 Bytes
3b02c0e
db7a8da
260f34f
db7a8da
 
 
 
 
 
 
 
 
260f34f
3b02c0e
260f34f
3b02c0e
260f34f
 
3b02c0e
dcf74a9
 
 
 
 
 
 
 
 
3b02c0e
2b6faf1
 
 
 
 
 
3b02c0e
 
260f34f
 
 
 
3b02c0e
260f34f
3b02c0e
3d5c479
3b02c0e
 
 
 
 
260f34f
 
 
 
3b02c0e
 
 
 
260f34f
 
 
 
728fffb
3b02c0e
 
728fffb
d97593e
3b02c0e
4e70995
 
625c69d
4e70995
3b02c0e
 
260f34f
e9114a0
 
 
 
 
 
 
260f34f
e9114a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260f34f
 
e9114a0
260f34f
e9114a0
 
 
 
260f34f
5ee3839
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
base_model: llama-3B
tags:
- text-generation
- sql
- peft
- lora
- rslora
- unsloth
- llama3
- instruction-tuned
license: mit
---

# SQLGenie - LoRA Fine-Tuned LLaMA 3B for Text-to-SQL Generation

**SQLGenie** is a lightweight LoRA adapter fine-tuned on top of Unsloth’s 4-bit LLaMA 3 (3B) model. It is designed to convert natural language instructions into valid SQL queries with minimal compute overhead, making it ideal for integrating into data-driven applications,or chat interfaces.
it has been trained over 100K types of text based on various different domains such as Education, Technical, Health and more

## Model Highlights

- **Base model**: `Llama3 3B`
- **Tokenizer**: Compatible with `Llama3 3B`
- **Fine tuned for**: Text to SQL Converter
- **Accuracy**: > 85%
- **Language**: English Natural Language Sentences fientuned
- **Format**: `safetensors`


## Model Dependencies

- **Python Version**: `3.10`
- **libraries**: `unsloth`
- pip install unsloth
  
### Model Description

- **Developed by:** Merwin
- **Model type:** PEFT adapter (LoRA) for Causal Language Modeling
- **Language(s):** English
- **Fine-tuned from model:** [unsloth/llama-3.2-3b-unsloth-bnb-4bit](https://huggingface.co/unsloth/llama-3.2-3b-unsloth-bnb-4bit)

### Model Sources

- **Repository:** https://huggingface.co/mervp/SQLGenie

## Uses

### Direct Use

This model can be directly used to generate SQL queries from natural language prompts. Example use cases include:
- Building AI assistants for databases
- Enhancing Query tools with NL-to-SQL capabilities
- Automating analytics queries in various domains


## Bias, Risks, and Limitations

While the model has been fine-tuned for SQL generation, it may:
- Produce invalid SQL for a very few edge cases
- Infer incorrect table or column names not present in prompt
- Assume a generic SQL dialect (closer to MySQL/PostgreSQL Databases)


### Recommendations
- Always validate and test generated queries before execution in a production database.


Thanks for visiting and downloading this model!
If this model helped you, please consider leaving a 👍 like. Your support helps this model reach more developers and encourages further improvements if any.
---

## How to Get Started with the Model

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mervp/SQLGenie",
    max_seq_length=2048,
    dtype=None,
)

prompt = """ You are an text to SQL query translator.
             Users will ask you questions in English
             and you will generate a SQL query based on their question
             SQL has to be simple, The schema context has been provided to you.


### User Question:
{}

### Sql Context:
{}

### Sql Query:
{}
"""

question = "List the names of customers who have an account balance greater than 6000."
schema = """
CREATE TABLE socially_responsible_lending (
    customer_id INT,
    name VARCHAR(50),
    account_balance DECIMAL(10, 2)
);

INSERT INTO socially_responsible_lending VALUES
    (1, 'james Chad', 5000),
    (2, 'Jane Rajesh', 7000),
    (3, 'Alia Kapoor', 6000),
    (4, 'Fatima Patil', 8000);
"""

inputs = tokenizer(
    [prompt.format(question, schema, "")],
    return_tensors="pt",
    padding=True,
    truncation=True
).to("cuda")

output = model.generate(
    **inputs,
    max_new_tokens=256, 
    temperature=0.2,  
    top_p=0.9,          
    top_k=50,        
    do_sample=True     
)

decoded_output = tokenizer.decode(output[0], skip_special_tokens=True)

if "### Sql Query:" in decoded_output:
    sql_query = decoded_output.split("### Sql Query:")[-1].strip()
else:
    sql_query = decoded_output.strip()

print(sql_query)