Model Description

This model is a fine-tuned version of unsloth/Meta-Llama-3.1-8B optimized for Text-to-SQL generation tasks. The fine-tuning was done using the Unsloth library with LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning. The training data consists of the first 5000 rows of the Clinton/Text-to-sql-v1 dataset.

  • Developed by: Vedant Rajpurohit
  • Model type: Causal Language Model
  • Language(s): English
  • Fine-tuned from model: unsloth/Meta-Llama-3.1-8B
  • Model size: 8.03B parameters
  • Precision: BF16

Direct Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model and tokenizer from the Hugging Face Hub
model_name = "Vedant3907/Text-to-Sql-llama3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)

model.eval()

# Define your test prompt
sql_prompt = """Below are SQL table schemas paired with instruction that describes a task.
Using valid SQLite, write a response that appropriately completes the request for the provided tables.

### Instruction: What is the 2007 result when the 2010 result was 2r, at the US Open?
### Input: CREATE TABLE table_name_91 ( tournament VARCHAR )
### Response:"""

# Tokenize input
inputs = tokenizer(sql_prompt, return_tensors="pt").to("cuda")

# Generate SQL query
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    do_sample=True,  # Use sampling for more diverse outputs
)

# Decode and print the generated output
generated_sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated SQL Query:")
print(generated_sql)

#SELECT 2007 FROM table_name_91 WHERE 2010 = "2r" AND tournament = "us open"

Bias, Risks, and Limitations

  • The model was only trained on first 5000 rows for 250 steps.
  • The model may generate incorrect or ambiguous SQL queries for instructions that are unclear or outside the training distribution.

Training Details

Dataset

  • Dataset Name: Clinton/Text-to-sql-v1
  • Rows Used: First 5000 rows of the dataset.

Training Procedure

The model was fine-tuned using the Unsloth library with LoRA adapters, enabling efficient training. Below are the hyperparameters used:

TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 10,  # 4% of 250 steps
    max_steps = 250,
    learning_rate = 1e-4,
    fp16 = not is_bfloat16_supported(),
    bf16 = is_bfloat16_supported(),
    logging_steps = 10,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "cosine",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none"
)

Hardware

  • Trained on google colab with its T4 GPU
Downloads last month
342
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Vedant3907/Text-to-Sql-llama3.1-8B

Finetuned
(155)
this model
Quantizations
3 models

Dataset used to train Vedant3907/Text-to-Sql-llama3.1-8B