--- language: en library_name: transformers pipeline_tag: text-classification license: mit tags: - sentiment-analysis - distilbert - sequence-classification - academic-peer-review - openreview datasets: - nhop/OpenReview base_model: - distilbert/distilbert-base-uncased --- # Academic Sentiment Classifier (DistilBERT) DistilBERT-based sequence classification model that predicts the sentiment polarity of academic peer-review text (binary: negative vs positive). It supports research on evaluating the sentiment of scholarly reviews and AI-generated critique, enabling large-scale, reproducible measurements for academic-style content. ## Model details - Architecture: DistilBERT for Sequence Classification (2 labels) - Max input length used during training: 512 tokens - Labels: - LABEL_0 -> negative - LABEL_1 -> positive - Format: `safetensors` ## Intended uses & limitations Intended uses: - Analyze sentiment of peer-review snippets, full reviews, or similar scholarly discourse. Limitations: - Binary polarity only (no neutral class); confidence scores should be interpreted with care. - Domain-specific: optimized for academic review-style English text; may underperform on general-domain data. - Not a replacement for human judgement or editorial decision-making. Ethical considerations and bias: - Scholarly reviews can contain technical jargon, hedging, and nuanced tone; polarity is an imperfect proxy for quality or fairness. - Potential biases may reflect those present in the underlying corpus. ## Training data The model was fine-tuned on a corpus of academic peer-review text curated from OpenReview review texts. The task is binary sentiment classification over review text spans. Note: If you plan to use or extend the underlying data, please review the terms of use for OpenReview and any relevant dataset licenses. ## Training procedure (high level) - Base model: DistilBERT (transformers) - Objective: single-label binary classification - Tokenization: standard DistilBERT tokenizer, truncation to 512 tokens - Optimizer/scheduler: standard Trainer defaults (AdamW with linear schedule) Exact hyperparameters may vary across runs; typical training uses AdamW with a linear learning rate schedule and truncation to 512 tokens. ## How to use Basic pipeline usage: ```python from transformers import pipeline clf = pipeline( task="text-classification", model="EvilScript/academic-sentiment-classifier", tokenizer="EvilScript/academic-sentiment-classifier", return_all_scores=False, ) text = "The paper is clearly written and provides strong empirical support for the claims." print(clf(text)) # Example output: [{'label': 'LABEL_1', 'score': 0.97}] # LABEL_1 -> positive ``` If you prefer friendly labels, you can map them: ```python from transformers import pipeline id2name = {"LABEL_0": "negative", "LABEL_1": "positive"} clf = pipeline("text-classification", model="EvilScript/academic-sentiment-classifier") res = clf("This section lacks clarity and the experiments are inconclusive.")[0] res["label"] = id2name.get(res["label"], res["label"]) # map to human-friendly label print(res) ``` Batch inference: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch device = 0 if torch.cuda.is_available() else -1 tok = AutoTokenizer.from_pretrained("EvilScript/academic-sentiment-classifier") model = AutoModelForSequenceClassification.from_pretrained("EvilScript/academic-sentiment-classifier") texts = [ "I recommend acceptance; the methodology is solid and results are convincing.", "Major concerns remain; the evaluation is incomplete and unclear.", ] inputs = tok(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits probs = torch.softmax(logits, dim=-1) pred_ids = probs.argmax(dim=-1) # Map to friendly labels id2name = {0: "negative", 1: "positive"} preds = [id2name[i.item()] for i in pred_ids] print(list(zip(texts, preds))) ``` ## Evaluation If you compute new metrics on public datasets or benchmarks, consider sharing them via a pull request to this model card. ## License The model weights and card are released under the MIT license. Review and comply with any third-party data licenses if reusing the training data. ## Citation If you use this model, please cite the project: ```bibtex @misc{federico_torrielli_2025, author = { Federico Torrielli and Stefano Locci }, title = { academic-sentiment-classifier }, year = 2025, url = { https://huggingface.co/EvilScript/academic-sentiment-classifier }, doi = { 10.57967/hf/6535 }, publisher = { Hugging Face } } ```