gpssohi commited on
Commit
f073544
·
1 Parent(s): 92ff6f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -10,12 +10,16 @@ datasets:
10
 
11
  # Introduction
12
 
13
- [HuggingFace](https://huggingface.co/) is one of the most useful libraries for a NLP researcher / developer as it provides numerous pre-trained models, datasets, and tons of utility functions for NLP. In this repository, I'm trying to setup a complete pipeline for a Machine Learning project and the task I've chosen for the setup is Question Generation for Paragraphs. This is a seq2seq task for which I intend to fine-tune a pre-trained encoder-decoder Transformer model for Extractive Summarization like BART / Pegasus. More specifically, I'm finetuning the `sshleifer/distilbart-cnn-6-6` model on the SQuAD dataset.
14
 
15
  # Usage
16
 
17
  The input format is as follows: `[answer] <s> [passage]`. The model will predict the question that corresponds to the answer from the passage.
18
 
 
 
 
 
19
  # Dataset
20
 
21
  The goal of Question Generation is to generate a valid and fluent question according to a given passage and the target answer. Hence, the input to the model will be a passage context and an answer, and the output / target will be the question for the given answer. Question Generation can be used in many scenarios, such as automatic tutoring systems, improving the performance of Question Answering models and enabling chat-bots to lead a conversation. The final dataset is created by taking the union of the following Question Answering Datasets. The dataset must have the following three columns: context, question, answer.
@@ -26,7 +30,7 @@ Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset,
26
 
27
  ### Preprocessing
28
 
29
- The first step is to remove questions which don't have answers. After that, we split the train set into Train and Eval sets and treat the dev set as the test set.
30
 
31
  ### Stats
32
 
 
10
 
11
  # Introduction
12
 
13
+ This model checkpoint is obtained by fine-tuning the `sshleifer/distilbart-cnn-6-6` summarization checkpoint on the SQuAD dataset.
14
 
15
  # Usage
16
 
17
  The input format is as follows: `[answer] <s> [passage]`. The model will predict the question that corresponds to the answer from the passage.
18
 
19
+ # Plot
20
+
21
+ ![Training Run](plots/train_run_6.png)
22
+
23
  # Dataset
24
 
25
  The goal of Question Generation is to generate a valid and fluent question according to a given passage and the target answer. Hence, the input to the model will be a passage context and an answer, and the output / target will be the question for the given answer. Question Generation can be used in many scenarios, such as automatic tutoring systems, improving the performance of Question Answering models and enabling chat-bots to lead a conversation. The final dataset is created by taking the union of the following Question Answering Datasets. The dataset must have the following three columns: context, question, answer.
 
30
 
31
  ### Preprocessing
32
 
33
+ The first step is to remove questions that don't have answers. After that, we split the train set into Train and Eval sets and treat the dev set as the test set.
34
 
35
  ### Stats
36