Gemma 2 2B Fine-Tuned on Indian Laws π§ββοΈ
This is a fine-tuned version of the Gemma 2 2B model on a custom dataset of 24.6k QnA pairs on Indian Laws. It is designed to assist in legal research, drafting, and analysis specifically for Indian legal contexts. Model is fine-tuned using unsloth's training configuration for faster training and infernece.
Sample Run:
Model Details
- Model Name: Gemma 2 2B Indian Law
- Base Model: Gemma 2 2B
- Model Size: 5.2 gb (Lora weights Merged at 16bit)
- Fine-Tuning Dataset: Custom dataset derived from Indian legal documents, case laws, and statutes - https://huggingface.co/datasets/viber1/indian-law-dataset?row=19
- Task: Legal text generation, legal QnA, legal text understanding, and legal research.
- Language: English
- Framework: Hugging Face Transformers, Unsloth
How to Use
You can use this model with the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForCausalLM
# Define Model Name
model_name = "Ananya8154/Gemma-2-2B-Indian-Law"
"""Initialize Model and tokenizer"""
model = AutoModelForCausalLM.from_pretrained(
model_name
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Define a prompt
prompt = """
Below is a question related to law, please provide an answer.
### Question:
{}
### Answer:
{}
"""
"""Tokenize input text"""
user_query = "When did the North-Eastern Areas (Reorganisation) Act, 1971 come into effect?"
inputs = tokenizer(
[
prompt.format(
user_query,
"",
)
], return_tensors = "pt"
)
"""Generate Outputs"""
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
# Decode tokens to text
tokenizer.batch_decode(outputs) # Final Response
Training Details
- Fine-Tuning Objective: The model was fine-tuned on a dataset of Indian legal texts to improve its performance in generating legal content, answering legal questions, and summarizing case laws.
- Training Duration: 1.5 hrs on 4 Tesla T4 GPUs.
- Hardware Used: Tesla T4 GPU (16 gb VRAM)
- Optimizer: Adam optim. (8 bit)
- Learning Rate: 2e - 4
- Bfloat Enabled: False
- Batch Size: (8)
- Num Training Epochs: 1
- Training Loss: 0.54
- Learning Rate scheduler: Linear
- Dataset Description:
The dataset consists of:
- Indian law funamental queries
- Special Acts information
- Legal articles and commentaries
- Past cases and proceedings
Intended Use
This model is designed for:
- Legal QnA
- Quering previous cases
- Question answering related to Special Acts, articles and commentries
Out-of-Scope Uses
- This model is not intended for providing legal advice or making decisions in a legal context without human supervision.
- It should not be used for non-Indian legal contexts without additional fine-tuning.
Ethical Considerations
- Biases in Dataset: The dataset used may reflect biases present in Indian legal texts, such as systemic discrimination in historical case laws.
- Privacy Concerns: Ensure that the data used in fine-tuning does not contain sensitive or personally identifiable information (PII).
- Hallucination Risks: The model may generate false or misleading legal information. Users should verify outputs with authoritative sources.
Contact
- About Me
- Ananya kumar
- Email - [email protected]
- Linkedin - https://www.linkedin.com/in/ananya8154/
- Downloads last month
- 67
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Ananya8154/Gemma-2-2B-Indian-Law
Base model
google/gemma-2-2b