Model Card for Model ID
The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries.
Model Details
Model Description
The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries. The training process involved:
- Loading the pre-trained FLAN-T5 model
- Preprocessing the CNN/DailyMail dataset
- Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's Transformers library
- Training parameters:
- Learning rate: 5e-5
- Batch size: 12
- Number of epochs: 4
- FP16 mixed precision
- Developed by: Preksha Joon
- Model type: [More Information Needed]
- Language(s) (NLP): English
- License: MIT
- Finetuned from model [optional]: FLAN-T5
Model Sources [optional]
- Repository: https://colab.research.google.com/drive/1utAHMxm1CSJIFUPZ9X4aXuIVZl3M3o6C?usp=sharing
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Here's an example of how to use the model for inference:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("PreshaJoon/flan-t5-finetuned-summarization")
tokenizer = AutoTokenizer.from_pretrained("PrekshaJoon/flan-t5-finetuned-summarization")
def generate_summary(article):
inputs = tokenizer("summarize: " + article, return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids, skip_special_tokens=True)
return summary
## Deploy and use the model
from transformers import pipeline
summarizer = pipeline("summarization", model="PrekshaJoon/flan-t5-finetuned-summarization")
article = "Write your article here..."
summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)
print(summary[0]['summary_text'])
### Direct Use
article = "Your long article text here..."
summary = generate_summary(article)
print(summary)
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
The training process involved:
1. Loading the pre-trained FLAN-T5 model
2. Preprocessing the CNN/DailyMail dataset
3. Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's
#### Preprocessing
Preprocess the dataset by tokenizing it and preparing it for the FLAN-T5 model.
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Metrics
Use of rogue-score matric for evaluation
### EvaluationResults
The model was evaluated using ROUGE scores. Here are the results on the validation set:
rouge1: 0.3913
rouge2: 0.2889
rougeL: 0.3696
rougeLsum: 0.3696
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** A100 GPU
- **Hours used:** 7
- **Cloud Provider:** Google
- **Compute Region:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## Model Card Contact
[email protected]
- Downloads last month
- 17
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for PrekshaJoon/flan-t5-finetuned-summarization
Base model
google/flan-t5-small