Model Card for DistilGPT2 Fine-Tuned on the Indian Constitution

Model Summary

This is a fine-tuned version of DistilGPT2 on the Indian Constitution. It has been trained to generate text consistent with the style and language of the Indian Constitution, making it a useful resource for legal text generation and educational purposes.


Model Details

Model Description

This model is a fine-tuned version of the DistilGPT2 model, specifically trained on the text of the Indian Constitution. It can generate contextually accurate legal text and provides a demonstration of fine-tuning GPT-style models for domain-specific tasks.

  • Developed by: Susant Achary
  • Financed by: [No specific funding; self-driven project]
  • Shared by: Susant Achary
  • Model type: Causal Language Model (AutoRegressive Transformer)
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Fine-tuned from: distilbert/distilgpt2

Model Sources


Data Source Trained on

  • Resository:[Susant-Achary/constitution-of-india-dataset]

Uses

Direct Use

The model is suitable for generating:

  • Contextually accurate text resembling the Indian Constitution.
  • Legal or constitutional examples for research or education.
  • Domain-specific text generation tasks.

Downstream Use

The model can be further fine-tuned for:

  • Other legal text corpora.
  • Domain-specific legal or policy text generation.

Out-of-Scope Use

  • Malicious or unethical use, including generating misleading or harmful legal text.
  • Tasks requiring understanding or reasoning outside the scope of its training data (e.g., non-legal content).

Bias, Risks, and Limitations

Biases

  • The model is limited to the specific style and content of the Indian Constitution, which may not generalize well to other legal systems or contexts.

Limitations

  • Limited vocabulary: It was trained solely on the Indian Constitution, so it may struggle with prompts outside this domain.
  • Lacks reasoning: The model cannot provide explanations or legal reasoning.

Recommendations

  • Use responsibly in legal and educational contexts.
  • Verify generated text before usage to avoid inaccuracies or misinterpretations.

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import pipeline

model_name = "Susant-Achary/distilgpt2-constitution-of-india"

gen_pipeline = pipeline(
    "text-generation",
    model=model_name, 
    tokenizer=model_name
)

prompt = "We, the people of India"
output = gen_pipeline(
    prompt,
    max_length=100,
    do_sample=True,
    temperature=0.8,
    top_k=100,
    top_p=0.95,
    num_return_sequences=1
)

print(output[0]['generated_text'])
Downloads last month
6
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.