fahmiaziz
/

qwen3-1.7B-text2sql

Text Generation

text-generation-inference

Model card Files Files and versions

qwen3-1.7B-text2sql / README.md

fahmiaziz's picture

Update README.md

c1da020 verified 5 months ago

|

history blame contribute delete

3.2 kB

	---
	base_model: unsloth/Qwen3-1.7B
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3
	license: apache-2.0
	language:
	- en
	---
	# Fine-tuning Qwen3-1.7B for Text-to-SQL Task
	This project demonstrates the fine-tuning of the `Qwen3-1.7B` language model using a combined and preprocessed dataset for Text-to-SQL generation. The goal is to train the model to generate SQL queries from natural language questions given database schemas.

	## Dataset
	We used the [fahmiaziz/text2sql-dataset](https://huggingface.co/datasets/fahmiaziz/text2sql-dataset), which merges examples from:
	- Wikisql
	- Bird
	- Spider
	- Synthetic SQL samples

	Before training, the dataset was cleaned and filtered by:
	- Removing DDL/DML examples (`INSERT`, `UPDATE`, `DELETE`, etc.)
	- Deduplicating examples based on semantic hashing of both SQL and questions
	- Filtering only SELECT-style analytical queries

	## Training Format
	Since Qwen3 models require a two-part output (`<think>` + final answer), and our dataset does not contain intermediate reasoning, we left the `<think>` section empty during fine-tuning.

	### Example Format:

	```

	<\|im_start\|>system
	Given the database schema and the user question, generate the corresponding SQL query.
	<\|im_end\|>
	<\|im_start\|>user

	\[SCHEMA]
	CREATE TABLE Inclusive\_Housing (Property\_ID INT, Inclusive VARCHAR(10), Property\_Size INT);
	INSERT INTO Inclusive\_Housing (Property\_ID, Inclusive, Property\_Size)
	VALUES (1, 'Yes', 900), (2, 'No', 1100), (3, 'Yes', 800), (4, 'No', 1200);

	\[QUESTION]
	What is the average property size in inclusive housing areas?
	<\|im_end\|>
	<\|im_start\|>assistant
	<think>

	</think>
	SELECT AVG(Property\_Size) FROM Inclusive\_Housing WHERE Inclusive = 'Yes';
	<\|im_end\|>

	````

	## Training Configuration
	Due to hardware limitations, full model training was not possible. Instead, we applied LoRA (Low-Rank Adaptation) with the following configuration:
	- LoRA rank (`r`): 128
	- LoRA alpha: 256
	- Hardware: Kaggle T4 x2 GPUs

	### Training Hyperparameters
	```
	per_device_train_batch_size = 6,
	gradient_accumulation_steps = 2,
	warmup_steps = 5,
	max_steps = 500,
	num_train_epochs = 3,
	learning_rate = 1e-4,
	fp16 = not is_bf16_supported(),
	bf16 = is_bf16_supported(),
	logging_steps = 25,
	optim = "adamw_8bit",
	weight_decay = 0.01,
	lr_scheduler_type = "linear",
	seed = 3407,
	output_dir = "outputs_v4",
	dataset_text_field = "text",
	max_seq_length = 1024,
	````

	## Training Results
	```
	global_step=500,
	training_loss=0.5783241882324218
	```

	## Evaluation
	We evaluated the model using Exact Match (EM) score on a manually selected sample of 100 examples. We get score 50%

	---

	## Notes
	* In future iterations, we plan to:

	* Add complex/long context schema
	* Full Finetuning



	# Uploaded finetuned model

	- Developed by: fahmiaziz
	- License: apache-2.0
	- Finetuned from model : unsloth/Qwen3-1.7B

	This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)