hooman650 commited on
Commit
3ac32fe
·
1 Parent(s): ac05daa

Updated to do list

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -8,6 +8,19 @@ The main goals of this project are:
8
  2. Release the top performing models for further research and enhancement
9
  3. Release all of the preprocessing and postprocessing scripts and findings for future research.
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ## 1. Model
12
 
13
  We will be using T5 model.
@@ -35,4 +48,4 @@ We can make use of :
35
 
36
  ## 4. Additional Reading
37
 
38
- - [How Much Knowledge Can You Pack Into the Parameters of a Language Model?](https://arxiv.org/pdf/2002.08910.pdf)
 
8
  2. Release the top performing models for further research and enhancement
9
  3. Release all of the preprocessing and postprocessing scripts and findings for future research.
10
 
11
+ ## TO DO LIST:
12
+ - [x] Team members met and the following was discussed:
13
+ - Data preparation script is prepared that mixes CORD-19 and Pubmed.
14
+ - Agreed to finalize the training scripts by 9pm PDT 7/9/2021.
15
+ - Tokenizer is now trained.
16
+ - [ ] Setup the pretraining script
17
+ - [ ] Prepare the finetuning tasks inspired from [T5 Trivia Colab](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb)
18
+ - What datasets we want to go with?
19
+ - [Covid-QA](https://huggingface.co/datasets/covid_qa_deepset) (Maybe as test set?)
20
+ - [Trivia](https://huggingface.co/datasets/covid_qa_deepset)
21
+ - [CDC-QA](https://www.cdc.gov/coronavirus/2019-ncov/faq.html) (We can scrape quickly using beautiful soup or something)
22
+ - [More Medical Datasets](https://aclanthology.org/2020.findings-emnlp.289.pdf) (See the dataset section for inspiratio
23
+
24
  ## 1. Model
25
 
26
  We will be using T5 model.
 
48
 
49
  ## 4. Additional Reading
50
 
51
+ - [How Much Knowledge Can You Pack Into the Parameters of a Language Model?](https://arxiv.org/pdf/2002.08910.pdf)