Model Card for DistilGPT2 Fine-Tuned on the Indian Constitution

Model Summary

This is a fine-tuned version of DistilGPT2 on the Indian Constitution. It has been trained to generate text consistent with the style and language of the Indian Constitution, making it a useful resource for legal text generation and educational purposes.

Model Details

Model Description

This model is a fine-tuned version of the DistilGPT2 model, specifically trained on the text of the Indian Constitution. It can generate contextually accurate legal text and provides a demonstration of fine-tuning GPT-style models for domain-specific tasks.

Developed by: Susant Achary
Financed by: [No specific funding; self-driven project]
Shared by: Susant Achary
Model type: Causal Language Model (AutoRegressive Transformer)
Language(s) (NLP): English
License: Apache 2.0
Fine-tuned from: distilbert/distilgpt2

Model Sources

Repository: Susant Achary's Hugging Face
Demo: Use directly via Hugging Face Hub

Data Source Trained on

Resository:[Susant-Achary/constitution-of-india-dataset]

Uses

Direct Use

The model is suitable for generating:

Contextually accurate text resembling the Indian Constitution.
Legal or constitutional examples for research or education.
Domain-specific text generation tasks.

Downstream Use

The model can be further fine-tuned for:

Other legal text corpora.
Domain-specific legal or policy text generation.

Out-of-Scope Use

Malicious or unethical use, including generating misleading or harmful legal text.
Tasks requiring understanding or reasoning outside the scope of its training data (e.g., non-legal content).

Bias, Risks, and Limitations

Biases

The model is limited to the specific style and content of the Indian Constitution, which may not generalize well to other legal systems or contexts.

Limitations

Limited vocabulary: It was trained solely on the Indian Constitution, so it may struggle with prompts outside this domain.
Lacks reasoning: The model cannot provide explanations or legal reasoning.

Recommendations

Use responsibly in legal and educational contexts.
Verify generated text before usage to avoid inaccuracies or misinterpretations.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import pipeline

model_name = "Susant-Achary/distilgpt2-constitution-of-india"

gen_pipeline = pipeline(
    "text-generation",
    model=model_name, 
    tokenizer=model_name
)

prompt = "We, the people of India"
output = gen_pipeline(
    prompt,
    max_length=100,
    do_sample=True,
    temperature=0.8,
    top_k=100,
    top_p=0.95,
    num_return_sequences=1
)

print(output[0]['generated_text'])