--- license: llama3.1 base_model: meta-llama/Llama-3.1-8B-Instruct tags: - llama-3.1 - sql - fine-tuned - agent - unsloth - text-generation language: - en pipeline_tag: text-generation datasets: - custom metrics: - loss --- # Better SQL Agent - Llama 3.1 8B ## Training Results - **Training Samples**: 19,480 (SQL analytics + technical conversations) - **Hardware**: NVIDIA 4x A10G GPU (96GB VRAM) ## Model Description This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**, specifically optimized for: - **SQL query generation and optimization** - **Data analysis and insights** - **Technical assistance and debugging** - **Tool-based workflows** ## Training Configuration - **Base Model**: `meta-llama/Llama-3.1-8B-Instruct` - **Training Method**: LoRA (Low-Rank Adaptation) - Rank: 16, Alpha: 32, Dropout: 0.05 - **Quantization**: 4-bit with BF16 training precision - **Context Length**: 128K tokens (extended from base) - **Optimizer**: AdamW with cosine scheduling ## Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load the fine-tuned model model_name = "abhishekgahlot/better-sql-agent-llama" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) # Generate SQL query prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|> Create a SQL query to find the top 5 customers by total revenue in 2024: <|eot_id|><|start_header_id|>assistant<|end_header_id|> """ inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) print(response) ``` ## Performance Metrics | Metric | Value | |--------|-------| | **Starting Loss** | 1.53 | | **Final Loss** | 0.0508 | | **Loss Reduction** | **96.7%** | | **Training Time** | 8.9 hours | ## Use Cases - **SQL Generation**: Create complex queries from natural language - **Data Analysis**: Generate insights and analytical queries - **Code Assistance**: Debug and optimize SQL code - **Technical Support**: Answer database and analytics questions - **Learning Aid**: Explain SQL concepts and best practices ## Training Data The model was trained on a curated dataset of **19,480 high-quality examples** including: - SQL query generation tasks - Data analysis conversations - Technical problem-solving dialogues - Tool usage patterns and workflows ## Optimization Features - **4-bit Quantization**: Reduced memory footprint - **Flash Attention**: Optimized attention mechanism - **Mixed Precision**: BF16 training for efficiency ## License This model inherits the **Llama 3.1 license** from the base model. Please review the [official license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) for usage terms. ## Acknowledgments - Based on Meta's Llama 3.1 8B Instruct model ## Model Card Contact For questions about this model, please open an issue in the repository or contact the model author. ---