Academic Sentiment Classifier (DistilBERT)
DistilBERT-based sequence classification model that predicts the sentiment polarity of academic peer-review text (binary: negative vs positive). It supports research on evaluating the sentiment of scholarly reviews and AI-generated critique, enabling large-scale, reproducible measurements for academic-style content.
Model details
- Architecture: DistilBERT for Sequence Classification (2 labels)
- Max input length used during training: 512 tokens
- Labels:
- LABEL_0 -> negative
- LABEL_1 -> positive
- Format:
safetensors
Intended uses & limitations
Intended uses:
- Analyze sentiment of peer-review snippets, full reviews, or similar scholarly discourse.
Limitations:
- Binary polarity only (no neutral class); confidence scores should be interpreted with care.
- Domain-specific: optimized for academic review-style English text; may underperform on general-domain data.
- Not a replacement for human judgement or editorial decision-making.
Ethical considerations and bias:
- Scholarly reviews can contain technical jargon, hedging, and nuanced tone; polarity is an imperfect proxy for quality or fairness.
- Potential biases may reflect those present in the underlying corpus.
Training data
The model was fine-tuned on a corpus of academic peer-review text curated from OpenReview review texts. The task is binary sentiment classification over review text spans.
Note: If you plan to use or extend the underlying data, please review the terms of use for OpenReview and any relevant dataset licenses.
Training procedure (high level)
- Base model: DistilBERT (transformers)
- Objective: single-label binary classification
- Tokenization: standard DistilBERT tokenizer, truncation to 512 tokens
- Optimizer/scheduler: standard Trainer defaults (AdamW with linear schedule)
Exact hyperparameters may vary across runs; typical training uses AdamW with a linear learning rate schedule and truncation to 512 tokens.
How to use
Basic pipeline usage:
from transformers import pipeline
clf = pipeline(
task="text-classification",
model="EvilScript/academic-sentiment-classifier",
tokenizer="EvilScript/academic-sentiment-classifier",
return_all_scores=False,
)
text = "The paper is clearly written and provides strong empirical support for the claims."
print(clf(text))
# Example output: [{'label': 'LABEL_1', 'score': 0.97}] # LABEL_1 -> positive
If you prefer friendly labels, you can map them:
from transformers import pipeline
id2name = {"LABEL_0": "negative", "LABEL_1": "positive"}
clf = pipeline("text-classification", model="EvilScript/academic-sentiment-classifier")
res = clf("This section lacks clarity and the experiments are inconclusive.")[0]
res["label"] = id2name.get(res["label"], res["label"]) # map to human-friendly label
print(res)
Batch inference:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
device = 0 if torch.cuda.is_available() else -1
tok = AutoTokenizer.from_pretrained("EvilScript/academic-sentiment-classifier")
model = AutoModelForSequenceClassification.from_pretrained("EvilScript/academic-sentiment-classifier")
texts = [
"I recommend acceptance; the methodology is solid and results are convincing.",
"Major concerns remain; the evaluation is incomplete and unclear.",
]
inputs = tok(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred_ids = probs.argmax(dim=-1)
# Map to friendly labels
id2name = {0: "negative", 1: "positive"}
preds = [id2name[i.item()] for i in pred_ids]
print(list(zip(texts, preds)))
Evaluation
If you compute new metrics on public datasets or benchmarks, consider sharing them via a pull request to this model card.
License
The model weights and card are released under the MIT license. Review and comply with any third-party data licenses if reusing the training data.
Citation
If you use this model, please cite the project:
@misc{federico_torrielli_2025,
author = { Federico Torrielli and Stefano Locci },
title = { academic-sentiment-classifier },
year = 2025,
url = { https://huggingface.co/EvilScript/academic-sentiment-classifier },
doi = { 10.57967/hf/6535 },
publisher = { Hugging Face }
}
- Downloads last month
- 8
Model tree for EvilScript/academic-sentiment-classifier
Base model
distilbert/distilbert-base-uncased