Text Summarization Model with Seq2Seq and LSTM

This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens.

Dataset

CNN-DailyMail News Text Summarization from kaggle

Model Architecture

Encoder

  • Input Layer: Takes input sequences of length max_len_article.
  • Embedding Layer: Converts input sequences into dense vectors of size 100.
  • Bidirectional LSTM Layer: Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions.
  • State Concatenation: Combines forward and backward hidden and cell states to form the final encoder states.

Decoder

  • Input Layer: Takes target sequences of variable length.
  • Embedding Layer: Converts target sequences into dense vectors of size 100.
  • LSTM Layer: Processes the embedded target sequences using an LSTM with the initial states set to the encoder states.
  • Dense Layer: Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary.

Model Summary

Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(None, 800)] 0 -
embedding (Embedding) (None, 800, 100) 47,619,900 input_1[0][0]
bidirectional [(None, 200), 160,800 embedding[0][0]
(Bidirectional) (None, 100),
(None, 100),
(None, 100),
(None, 100)]
input_2 (InputLayer) [(None, None)] 0 -
embedding_1 (None, None, 100) 15,515,800 input_2[0][0]
(Embedding)
concatenate (None, 200) 0 bidirectional[0][1]
(Concatenate) bidirectional[0][3]
concatenate_1 (None, 200) 0 bidirectional[0][2]
(Concatenate) bidirectional[0][4]
lstm [(None, None, 200), 240,800 embedding_1[0][0]
(LSTM) (None, 200), concatenate[0][0]
(None, 200)] concatenate_1[0][0]
dense (Dense) (None, None, 155158) 31,186,758 lstm[0][0]

Total params: 94,724,060

Trainable params: 94,724,058

Non-trainable params: 0

Training

The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration:

  • Optimizer: Adam
  • Loss Function: Categorical Crossentropy
  • Metrics: Accuracy

Training Loss and Validation Loss

Epoch Training Loss Validation Loss Time per Epoch (s)
1 3.9044 0.4543 3087
2 0.3429 0.0976 3091
3 0.1054 0.0427 3096
4 0.0490 0.0231 3099
5 0.0203 0.0148 3098

Test Loss

Test Loss
0.014802712015807629

Usage -- I will update this soon

To use this model, you can load it using the Hugging Face Transformers library:

from transformers import TFAutoModel

model = TFAutoModel.from_pretrained('your-model-name')

from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained('your-model-name')
model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name')

article = "Your input text here."
inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True)
summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print(summary)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.