flax-community
/

t5-covid-qa

Model card Files Files and versions Community

hooman650 commited on Jul 9, 2021

Commit

3ac32fe

·

1 Parent(s): ac05daa

Updated to do list

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -8,6 +8,19 @@ The main goals of this project are:
 2. Release the top performing models for further research and enhancement
 3. Release all of the preprocessing and postprocessing scripts and findings for future research.
 ## 1. Model
 We will be using T5 model.
@@ -35,4 +48,4 @@ We can make use of :
 ## 4. Additional Reading
-- [How Much Knowledge Can You Pack Into the Parameters of a Language Model?](https://arxiv.org/pdf/2002.08910.pdf)

 2. Release the top performing models for further research and enhancement
 3. Release all of the preprocessing and postprocessing scripts and findings for future research.
+## TO DO LIST:
+- [x] Team members met and the following was discussed:
+	- Data preparation script is prepared that mixes CORD-19 and Pubmed.
+	- Agreed to finalize the training scripts by 9pm PDT 7/9/2021.
+	- Tokenizer is now trained.
+- [ ] Setup the pretraining script
+- [ ] Prepare the finetuning tasks inspired from [T5 Trivia Colab](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb)
+	- What datasets we want to go with?
+		- [Covid-QA](https://huggingface.co/datasets/covid_qa_deepset) (Maybe as test set?)
+		- [Trivia](https://huggingface.co/datasets/covid_qa_deepset)
+		- [CDC-QA](https://www.cdc.gov/coronavirus/2019-ncov/faq.html) (We can scrape quickly using beautiful soup or something)
+		- [More Medical Datasets](https://aclanthology.org/2020.findings-emnlp.289.pdf) (See the dataset section for inspiratio
 ## 1. Model
 We will be using T5 model.
 ## 4. Additional Reading
+- [How Much Knowledge Can You Pack Into the Parameters of a Language Model?](https://arxiv.org/pdf/2002.08910.pdf)