|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- reranker |
|
|
- cross-encoder |
|
|
- sequence-classification |
|
|
- vllm |
|
|
base_model: Qwen/Qwen3-Reranker-4B |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Qwen3-Reranker-4B-seq-cls-vllm-fixed |
|
|
|
|
|
This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a pre-converted version of [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) that: |
|
|
- Has been converted from CausalLM to SequenceClassification architecture |
|
|
- Includes proper configuration for vLLM compatibility |
|
|
- Provides ~75,000x reduction in classification head size |
|
|
- Offers ~150,000x fewer operations per token compared to using the full LM head |
|
|
|
|
|
## Key Improvements |
|
|
|
|
|
The original converted model ([tomaarsen/Qwen3-Reranker-4B-seq-cls](https://huggingface.co/tomaarsen/Qwen3-Reranker-4B-seq-cls)) was missing critical vLLM configuration attributes. This version adds: |
|
|
|
|
|
```json |
|
|
{ |
|
|
"classifier_from_token": ["no", "yes"], |
|
|
"method": "from_2_way_softmax", |
|
|
"use_pad_token": false, |
|
|
"is_original_qwen3_reranker": false |
|
|
} |
|
|
``` |
|
|
|
|
|
These configurations are essential for vLLM to properly handle the pre-converted weights. |
|
|
|
|
|
## Usage with vLLM |
|
|
|
|
|
```bash |
|
|
vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \ |
|
|
--task score \ |
|
|
--served-model-name qwen3-reranker-4b \ |
|
|
--disable-log-requests |
|
|
``` |
|
|
|
|
|
### Python Example |
|
|
|
|
|
```python |
|
|
from vllm import LLM |
|
|
|
|
|
llm = LLM( |
|
|
model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed", |
|
|
task="score" |
|
|
) |
|
|
|
|
|
queries = ["What is the capital of France?"] |
|
|
documents = ["Paris is the capital of France."] |
|
|
|
|
|
outputs = llm.score(queries, documents) |
|
|
scores = [output.outputs.score for output in outputs] |
|
|
print(scores) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements: |
|
|
|
|
|
- **Memory**: ~600MB → ~8KB for classification head |
|
|
- **Compute**: 151,936 logits → 1 logit per forward pass |
|
|
- **Speed**: Faster inference due to reduced computation |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
- **Architecture**: Qwen3ForSequenceClassification |
|
|
- **Base Model**: Qwen/Qwen3-Reranker-4B |
|
|
- **Conversion Method**: from_2_way_softmax (yes_logit - no_logit) |
|
|
- **Model Size**: 4B parameters |
|
|
- **Task**: Reranking/Scoring |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original Qwen3-Reranker: |
|
|
|
|
|
```bibtex |
|
|
@misc{qwen3reranker2024, |
|
|
title={Qwen3-Reranker}, |
|
|
author={Qwen Team}, |
|
|
year={2024}, |
|
|
publisher={Hugging Face} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (inherited from the base model) |
|
|
|