roger33303
/

Best_Model-llama3.2-3b-16bit-Instruct-Finetune-website-QnA-gguf

text-generation-inference

Model card Files Files and versions

Best_Model-llama3.2-3b-16bit-Instruct-Finetune-website-QnA-gguf / README.md

roger33303's picture

Update README.md

0073e4a verified 8 months ago

|

history blame contribute delete

2.94 kB

	---
	base_model: unsloth/Llama-3.2-3B-Instruct
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- gguf
	license: apache-2.0
	language:
	- en
	---
	# Llama-3.2B Finetuned Model

	## 1. Introduction
	This model is a finetuned version of the Llama-3.2B large language model. It has been specifically trained to provide detailed and accurate responses for university course-related queries. This model offers insights on course details, fee structures, duration, and campus options, along with links to corresponding course pages. The finetuning process ensured domain-specific accuracy by utilizing a tailored dataset.

	---

	## GGUF Model:
	This is a GGUF model made for running offline with Ollama. A Modelfile is also created to locally host and run this model with Ollama

	## 2. Dataset Used for Finetuning
	The finetuning of the Llama-3.2B model was performed using a private dataset obtained through web scraping. Data was collected from the University of Westminster website and included:

	- Course titles
	- Campus details
	- Duration options (full-time, part-time, distance learning)
	- Fee structures (for UK and international students)
	- Course descriptions
	- Direct links to course pages

	This dataset was carefully cleaned and formatted to enhance the model's ability to provide precise responses to user queries.

	---

	## 3. How to Use This Model
	To use the Llama-3.2B finetuned model, follow the steps below:


	```python
	from transformers import TextStreamer

	def chatml(question, model):
	messages = [{"role": "user", "content": question},]

	inputs = tokenizer.apply_chat_template(messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt",).to("cuda")

	print(tokenizer.decode(inputs[0]))
	text_streamer = TextStreamer(tokenizer, skip_special_tokens=True,
	skip_prompt=True)
	return model.generate(input_ids=inputs,
	streamer=text_streamer,
	max_new_tokens=512)


	#Use the following example to test the model:
	question = "Does the University of Westminster offer a course on AI, Data and Communication MA?"
	x = chatml(question, model)

	```

	This setup ensures you can effectively query the Llama-3.2B finetuned model and receive detailed, relevant responses.

	---

	# Uploaded model

	- Developed by: roger33303
	- License: apache-2.0
	- Finetuned from model : unsloth/Llama-3.2-3B-Instruct

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)