Vedant3907/Text-to-Sql-llama3.1-8B

Model Description

This model is a fine-tuned version of unsloth/Meta-Llama-3.1-8B optimized for Text-to-SQL generation tasks. The fine-tuning was done using the Unsloth library with LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning. The training data consists of the first 5000 rows of the Clinton/Text-to-sql-v1 dataset.

Developed by: Vedant Rajpurohit
Model type: Causal Language Model
Language(s): English
Fine-tuned from model: unsloth/Meta-Llama-3.1-8B
Model size: 8.03B parameters
Precision: BF16

Direct Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer from the Hugging Face Hub
model_name = "Vedant3907/Text-to-Sql-llama3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)

model.eval()

# Define your test prompt
sql_prompt = """Below are SQL table schemas paired with instruction that describes a task.
Using valid SQLite, write a response that appropriately completes the request for the provided tables.

### Instruction: What is the 2007 result when the 2010 result was 2r, at the US Open?
### Input: CREATE TABLE table_name_91 ( tournament VARCHAR )
### Response:"""

# Tokenize input
inputs = tokenizer(sql_prompt, return_tensors="pt").to("cuda")

# Generate SQL query
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,  # Use sampling for more diverse outputs
)

# Decode and print the generated output
generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated SQL Query:")
print(generated_sql)

#SELECT 2007 FROM table_name_91 WHERE 2010 = "2r" AND tournament = "us open"

Bias, Risks, and Limitations

The model was only trained on first 5000 rows for 250 steps.
The model may generate incorrect or ambiguous SQL queries for instructions that are unclear or outside the training distribution.

Training Details

Dataset

Dataset Name: Clinton/Text-to-sql-v1
Rows Used: First 5000 rows of the dataset.

Training Procedure

The model was fine-tuned using the Unsloth library with LoRA adapters, enabling efficient training. Below are the hyperparameters used:

TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 10,  # 4% of 250 steps
    max_steps = 250,
    learning_rate = 1e-4,
    fp16 = not is_bfloat16_supported(),
    bf16 = is_bfloat16_supported(),
    logging_steps = 10,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "cosine",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none"
)

Hardware

Trained on google colab with its T4 GPU

Vedant3907
/

Text-to-Sql-llama3.1-8B