Model Card for Model ID

The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries.

Model Details

Model Description

The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries. The training process involved:

  1. Loading the pre-trained FLAN-T5 model
  2. Preprocessing the CNN/DailyMail dataset
  3. Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's Transformers library
  4. Training parameters:
    • Learning rate: 5e-5
    • Batch size: 12
    • Number of epochs: 4
    • FP16 mixed precision
  • Developed by: Preksha Joon
  • Model type: [More Information Needed]
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model [optional]: FLAN-T5

Model Sources [optional]

Uses

Here's an example of how to use the model for inference:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("PreshaJoon/flan-t5-finetuned-summarization")
tokenizer = AutoTokenizer.from_pretrained("PrekshaJoon/flan-t5-finetuned-summarization")

def generate_summary(article):
    inputs = tokenizer("summarize: " + article, return_tensors="pt", max_length=512, truncation=True)
    summary_ids = model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids, skip_special_tokens=True)
    return summary

## Deploy and use the model

from transformers import pipeline

summarizer = pipeline("summarization", model="PrekshaJoon/flan-t5-finetuned-summarization")

article = "Write your article here..."
summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)

print(summary[0]['summary_text'])

### Direct Use

article = "Your long article text here..."
summary = generate_summary(article)
print(summary)



## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

The training process involved:

1. Loading the pre-trained FLAN-T5 model
2. Preprocessing the CNN/DailyMail dataset
3. Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's 

#### Preprocessing 
Preprocess the dataset by tokenizing it and preparing it for the FLAN-T5 model.


#### Training Hyperparameters

- **Training regime:**  fp16 mixed precision

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

[More Information Needed]


#### Metrics

Use of rogue-score matric for evaluation

### EvaluationResults

The model was evaluated using ROUGE scores. Here are the results on the validation set:

rouge1: 0.3913
rouge2: 0.2889
rougeL: 0.3696
rougeLsum: 0.3696

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** A100 GPU
- **Hours used:** 7
- **Cloud Provider:** Google
- **Compute Region:** [More Information Needed]


## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]


## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]


## Model Card Contact
[email protected]
Downloads last month
17
Safetensors
Model size
77M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for PrekshaJoon/flan-t5-finetuned-summarization

Finetuned
(347)
this model

Dataset used to train PrekshaJoon/flan-t5-finetuned-summarization