Indah1 commited on
Commit
a08bb11
·
verified ·
1 Parent(s): 3038cdc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -50,13 +50,16 @@ The fine-tuning data used for BioChat is derived from the [ChatDoctor-5k](https:
50
  | Optimizer | AdamW_8Bit |
51
  | Warm Up Ratio | 0.03 |
52
  | Scheduler | Cosine |
53
- | Number of Epoch | 10 |
54
 
55
  ## Evaluation
56
 
57
- To determine the best model for fine-tuning, I used *perplexity* as a metric to evaluate performance and select the most optimal version. By leveraging the model's capabilities, I aim to evaluate its behavior and responses using tools like the *Word Embedding Association Test (WEAT)*. It is important to emphasize that its text generation features are intended solely for research purposes and are not yet suitable for production use. By releasing this model, we aim to drive advancements in biomedical NLP applications and contribute to best practices for the responsible development of domain-specific language models. Ensuring reliability, fairness, accuracy, and explainability remains a top priority for us.
58
-
59
-
 
 
 
60
  ### Framework versions
61
 
62
  - PEFT 0.11.1
 
50
  | Optimizer | AdamW_8Bit |
51
  | Warm Up Ratio | 0.03 |
52
  | Scheduler | Cosine |
53
+ | Number of Epoch | 5, 10, 15 |
54
 
55
  ## Evaluation
56
 
57
+ To determine the best model for fine-tuning, I used ***perplexity*** as a metric to evaluate performance and select the most optimal version. By leveraging the model's capabilities, I aim to evaluate its behavior and responses using tools like the ***Word Embedding Association Test (WEAT)***. Below are the WEAT scores and perplexity values for the model at epochs 5, 10, and 15, which helped in determining the best-performing version. It is important to emphasize that its text generation features are intended solely for research purposes and are not yet suitable for production use. By releasing this model, we aim to drive advancements in biomedical NLP applications and contribute to best practices for the responsible development of domain-specific language models. Ensuring reliability, fairness, accuracy, and explainability remains a top priority for us.
58
+ | Model Name | Perplexity Score | WEAT Score | Effect Size |
59
+ |:-------------------:|:----------------------------------:|:----------------------------------:|:----------------------------------:|
60
+ | **[BioChat5](https://huggingface.co/Indah1/BioChat5)** | **4.5799** | **-0.00652** | **-0.4059** |
61
+ | **[BioChat10](https://huggingface.co/Indah1/BioChat10)** | **4.5873** | **0.002351** | **0.06176** |
62
+ | **[BioChat15](https://huggingface.co/Indah1/BioChat15)** | **4.8864** | **0.00859** | **0.43890** |
63
  ### Framework versions
64
 
65
  - PEFT 0.11.1