|
--- |
|
license: apache-2.0 |
|
tags: |
|
- unsloth |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.1-8B |
|
--- |
|
|
|
# Pub-Guard-Llama-8B |
|
|
|
**Pub-Guard-Llama-8B** is a fine-tuned version of the Llama-3.1-8B model, specifically designed for detecting fraudulent papers in academic publications. |
|
|
|
Benefits of using this model: |
|
* the <u>first</u> LLM specifically designed for fraud detection in scientific articles |
|
* integration of external resources for better analysis (Semantic Scholar, OpenAlex, <u>Pubmed</u>...) |
|
* it offers <u>uncertainty-aware</u> predictions with <u>faithful</u> explanations |
|
|
|
**Datasets** |
|
* [Pubmed Retraction](https://huggingface.co/datasets/Lihuchen/pubmed_retraction) |
|
* [Instruction](https://huggingface.co/datasets/Lihuchen/pubmed_retraction_instruction) |
|
|
|
**Zero-Shot Reasoning** |
|
| Model | Breast Cancer|Lung Cancer| |
|
|------------------------------|---------------------|---------------------| |
|
| Llama-3.1-8B-Instruct | 20.0 |46.2| |
|
| OpenScholar-8B | 18.8 |20.3 |
|
| Bio-Medical-Llama-3-8B | 40.7 |56.7| |
|
| PMC-LLaMA-13B | 36.4 |-| |
|
| Pub-Guard-Llama-8B (Ours) | 69.5 |78.9| |
|
| Pub-Guard-Llama-8B (RAG) | - |-| |
|
| Pub-Guard-Llama-8B (Debate) | - |-| |
|
|
|
|
|
**Coherence Check** |
|
* Model: cross-encoder/nli-deberta-v3-base |
|
* Article -> (Sentence in Explanation) |
|
* Instruction Data (gpt-4o): 88.7 |
|
* Predicted Data (pub-guard-llama): - |
|
|
|
## Examples |
|
|
|
### Non-Retracted Article |
|
|
|
**Input**: |
|
```python |
|
Title: Bioinformatics-Based Discovery of CKLF-Like MARVEL Transmembrane Member 5 as a Novel Biomarker for Breast Cancer. |
|
|
|
Abstract: Chemokine-like factor (CKLF)-like MARVEL transmembrane members (CMTMs) represent a novel protein family linking the |
|
chemokine and transmembrane-4 superfamily families, which potentially play several roles in diverse physiological and pathological processes. |
|
The detailed functions and underlying molecular mechanisms of CMTMs remain elusive in breast cancer. |
|
Herein, we performed a comprehensive bioinformatic analysis to investigate the prognostic effect, potential functions, |
|
and biomolecular regulatory network of CMTMs in breast cancer. |
|
... |
|
|
|
Authors: Juan Zhou (author h-index: 22, Established Researcher); J. Lei (author h-index: 10, Early Career Researcher); |
|
Jun Wang (author h-index: 17, Established Researcher); Chen-Lu Lian (author h-index: 10, Early Career Researcher); |
|
L. Hua (author h-index: 9, Early Career Researcher); Zhen-Yu He (author h-index: 29, Established Researcher); |
|
San-Gang Wu (author h-index: 28, Established Researcher) |
|
|
|
Institutions: Sun Yat-sen University Cancer Center, Guangzhou, China. (institution average citation: 21.0, Established Institution); |
|
Teaching Hospital of Fujian Medical University, Xiamen, China. (institution average citation: 60.0, World-Class Institution) |
|
|
|
Journal: frontiers in cell and developmental biology (journal JCR: Q1, Top Level Journal) |
|
``` |
|
|
|
**Output**: |
|
```python |
|
No |
|
The article should be accepted due to its publication in a top-tier journal with a rigorous peer review process, |
|
the author s strong credibility with established h-indices, and affiliations with reputable institutions. |
|
The study presents original, well-supported findings on CMTM5's role in breast cancer, with no evidence of data fabrication or plagiarism. |
|
``` |
|
|
|
### Retracted Article |
|
|
|
**Input**: |
|
```python |
|
Title: Changes and Influencing Factors of Cognitive Impairment in Patients with Breast Cancer. |
|
|
|
Abstract: To investigate the changes in cognitive function and its influencing factors in patients with breast cancer after chemotherapy, |
|
to provide a scientific basis for further cognitive correction therapy. In this study, general information on age, |
|
marital status, and chemotherapy regimen was collected from 172 breast cancer chemotherapy patients. |
|
172 patients with breast cancer undergoing chemotherapy were investigated by convenience sampling method, |
|
and the subjects were tested one-on-one using the Chinese version of the MATRICS Consensus Cognitive Battery (MCCB) computer system. |
|
... |
|
|
|
Authors: Huixia Cui (author h-index: 6, Early Career Researcher); |
|
Xiaoxiu Song (author h-index: 1, Emerging Researcher); Wenlu Zhang (author h-index: 7, Early Career Researcher) |
|
|
|
Institutions: College of Nursing, Jinzhou Medical University, Jinzhou, Liaoning 121001, China. (institution average citation: 9.0, Emerging Institution); |
|
Department of Intensive Care Medicine, Liaocheng People's Hospital, Liaocheng, Shandong 252000, China. (institution average citation: 10.0, Emerging Institution); |
|
|
|
Journal: evidence-based complementary and alternative medicine : ecam (unknown journal) |
|
``` |
|
|
|
**Output**: |
|
```python |
|
Yes |
|
The article should be retracted due to potential data fabrication or manipulation, as the reported linear regression results lack clarity and statistical validation. |
|
Additionally, the journal's reputation and peer review rigor are questionable, and the authors' affiliations are emerging with low citation averages, |
|
raising concerns about research reliability. |
|
``` |
|
|