Checkpoint for paper MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

Our paper is accepted by EMNLP 2024 main conference as an oral presentation. The paper is available at arXiv.

This is the best medical sentence readability model trained on our dataset. This checkpoint uses a standard HuggingFace sentence prediction model.

Please find more details in our repo.

Quickstart on medical sentence readability model

# pip install transformers==4.35.2 torch --upgrade
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "chaojiang06/medreadme_medical_sentence_readability_prediction_CWI"
MAX_LEN  = 512

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
)
model.eval()

def score_sentences(sentences):
    enc = tokenizer(
        sentences,
        padding=True, truncation=True, max_length=MAX_LEN,
        return_tensors="pt"
    )
    with torch.no_grad():
        out = model(**enc).logits.squeeze(-1)  # shape: [batch]
    return out.tolist()

print(score_sentences([
    "Take one tablet by mouth twice daily after meals.",
    "The pathophysiological sequelae of dyslipidemia necessitate..."
]))

Below are automatically generated by the Huggingface library.

roberta-large+cwi.py+512+8+1e-5+1

This model is a fine-tuned version of roberta-large on the cwi dataset. It achieves the following results on the evaluation set:

Loss: 0.2137
Pearsonr: 0.8429
Addition Pearsonr: 0.8429
Addition Pearsonr Pvalue: 0.0000
Addition Spearmanr: 0.8297
Addition Spearmanr Pvalue: 0.0000

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 1
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10.0

Framework versions

Transformers 4.35.2
Pytorch 2.1.1+cu121
Datasets 2.15.0
Tokenizers 0.14.1

chaojiang06
/

medreadme_medical_sentence_readability_prediction_CWI