Checkpoint for paper MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

Our paper is accepted by EMNLP 2024 main conference as an oral presentation. The paper is available at arXiv.

This is the best medical sentence readability model trained on our dataset. This checkpoint uses a standard HuggingFace sentence prediction model.

Please find more details in our repo.

Quickstart on medical sentence readability model

# pip install transformers==4.35.2 torch --upgrade
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "chaojiang06/medreadme_medical_sentence_readability_prediction_CWI"
MAX_LEN  = 512

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
)
model.eval()

def score_sentences(sentences):
    enc = tokenizer(
        sentences,
        padding=True, truncation=True, max_length=MAX_LEN,
        return_tensors="pt"
    )
    with torch.no_grad():
        out = model(**enc).logits.squeeze(-1)  # shape: [batch]
    return out.tolist()

print(score_sentences([
    "Take one tablet by mouth twice daily after meals.",
    "The pathophysiological sequelae of dyslipidemia necessitate..."
]))

Below are automatically generated by the Huggingface library.

roberta-large+cwi.py+512+8+1e-5+1

This model is a fine-tuned version of roberta-large on the cwi dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2137
  • Pearsonr: 0.8429
  • Addition Pearsonr: 0.8429
  • Addition Pearsonr Pvalue: 0.0000
  • Addition Spearmanr: 0.8297
  • Addition Spearmanr Pvalue: 0.0000

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 1
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10.0

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1+cu121
  • Datasets 2.15.0
  • Tokenizers 0.14.1
Downloads last month
160
Safetensors
Model size
355M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chaojiang06/medreadme_medical_sentence_readability_prediction_CWI

Finetuned
(392)
this model

Evaluation results