FLAN T5 Small Questionizer

This model converts declarative statements into questions.

Example:
Input: The sun rises in the east and sets in the west.
Output: Where does the sun rise and set?

Usage

from transformers import pipeline

# Load the model
questionizer = pipeline("text2text-generation", model="agentlans/flan-t5-small-questionizer")

# Convert a statement into a question
statement = "Water covers approximately 71% of the Earth's surface, making it the most abundant substance on the planet's exterior."
question = questionizer(statement)[0]['generated_text']

print(question)
# Output: What percentage of the Earth's surface does water cover?

Examples

Click here for simple sentence examples

Input: The sun rises in the east and sets in the west.
Output: Where does the sun rise and set?

Input: Python is a popular programming language for beginners.
Output: What is a popular programming language for beginners?

Input: Elephants are the largest land animals on Earth.
Output: What are the largest land animals on Earth?

Input: Rainbows appear when sunlight passes through raindrops.
Output: When do rainbows appear?

Input: Saturn has beautiful rings made of ice and rock.
Output: What is the shape of Saturn's rings?

Input: Coffee is enjoyed by millions of people every morning.
Output: How many people enjoy coffee every morning?

Input: Mount Everest is the highest mountain in the world.
Output: What is the highest mountain in the world?

Input: Honeybees communicate through a dance called the waggle.
Output: How do honeybees communicate?

Input: Penguins live in cold climates and cannot fly.
Output: Where do Penguins live and cannot fly?

Input: Artists use different colors and shapes to express ideas.
Output: What do artists use to express ideas?

Click here for complex sentence examples

These sentences were randomly selected from the agentlans/high-quality-english-sentences dataset.

Input: Want to know what takes up the most space on your Hard Drive?
Output: What do you want to know about the most space on your Hard Drive?

Input: DISA officials say Defense must embrace mobility management tools to ensure military users don't lag behind the private sector in secure mobile computing capability worldwide.
Output: What do DISA officials say Defense must do to ensure military users don't lag behind the private sector in secure mobile computing capability worldwide?

Input: This is why learning a healthy nutritious eating pattern and adopting it for life, that is, lifestyle changes has a significant impact on our ability to reach and maintain a healthy weight.
Output: Why is it important to learn a healthy nutritious eating pattern and adopt it for life?

Input: In 1933, Bartlett was chosen to create the very first gift print, a color etching called 'Java'.
Output: What color etching was Bartlett chosen to create in 1933?

Input: In the United States, NERC petitions the Federal Energy Regulatory Commission (FERC) for approval of standards.
Output: What does NERC petition the Federal Energy Regulatory Commission (FERC) for in the United States?

Input: The majority of large businesses are public companies, with government enterprises, not-for-profit organisations and superannuation funds also represented.
Output: What is the majority of large businesses, and what types of organizations are represented?

Input: Because blueberries need moisture close to the soil surface, it is important to mulch them well.
Output: Why is it important to mulch blueberries well?

Input: One of the victims was a man but the other could not be determined.
Output: Who was one of the victims, and how was the other determined?

Input: The statute gives States and local educational agencies significant flexibility in how they direct resources and tailor interventions to the needs of individual schools identified for improvement.
Output: What flexibility does the statute provide for States and local educational agencies?

Input: Similar legislation would allay any hesitancy on the par of the banks in sharing cyber threat information with the government, Tunstall suggests.
Output: What would similar legislation allay in sharing cyber threat information with the government, according to Tunstall?

Limitations

  • The model works best with statements that provide enough context. Short or vague sentences may lead to hallucinated or unrelated questions. Example:

    Input: No.
    Output: Is there a requirement for a person to have a copy of a book in a library?

  • Not all statements are suitable for question generation. Some inputs may produce awkward questions or questions that do not match the intended meaning.

Tips for Better Results

  1. Use clear, informative statements: Include enough context so the model can generate a meaningful question.
  2. Prefer factual sentences: The model performs better on statements that contain concrete information (dates, quantities, events, definitions).
  3. Avoid extremely short inputs: Single words or one-word answers rarely produce useful questions.
  4. Check generated questions: While the model is powerful, review outputs for accuracy and relevance, especially for educational or professional use.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20.0

Training results

The model was trained for 20 epochs on over 153k samples, processing 221M tokens. It achieved a training loss of 0.64 and an evaluation loss of 1.30.

Training was efficient, with ~385 samples/sec and ~27k tokens/sec, and evaluation ran at ~820 samples/sec.

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 4.3.0
  • Tokenizers 0.22.1

Licence

Apache 2.0

Downloads last month
-
Safetensors
Model size
77M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/flan-t5-small-questionizer

Finetuned
(452)
this model

Datasets used to train agentlans/flan-t5-small-questionizer