Qwen3-Reranker-4B-seq-cls-vllm-fixed

This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM.

Model Description

This model is a pre-converted version of Qwen/Qwen3-Reranker-4B that:

Has been converted from CausalLM to SequenceClassification architecture
Includes proper configuration for vLLM compatibility
Provides ~75,000x reduction in classification head size
Offers ~150,000x fewer operations per token compared to using the full LM head

Key Improvements

The original converted model (tomaarsen/Qwen3-Reranker-4B-seq-cls) was missing critical vLLM configuration attributes. This version adds:

{
  "classifier_from_token": ["no", "yes"],
  "method": "from_2_way_softmax",
  "use_pad_token": false,
  "is_original_qwen3_reranker": false
}

These configurations are essential for vLLM to properly handle the pre-converted weights.

Usage with vLLM

vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \
    --task score \
    --served-model-name qwen3-reranker-4b \
    --disable-log-requests

Python Example

from vllm import LLM

llm = LLM(
    model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed",
    task="score"
)

queries = ["What is the capital of France?"]
documents = ["Paris is the capital of France."]

outputs = llm.score(queries, documents)
scores = [output.outputs.score for output in outputs]
print(scores)

Performance

This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements:

Memory: ~600MB → ~8KB for classification head
Compute: 151,936 logits → 1 logit per forward pass
Speed: Faster inference due to reduced computation

Technical Details

Architecture: Qwen3ForSequenceClassification
Base Model: Qwen/Qwen3-Reranker-4B
Conversion Method: from_2_way_softmax (yes_logit - no_logit)
Model Size: 4B parameters
Task: Reranking/Scoring

Citation

If you use this model, please cite the original Qwen3-Reranker:

@misc{qwen3reranker2024,
  title={Qwen3-Reranker},
  author={Qwen Team},
  year={2024},
  publisher={Hugging Face}
}

License

Apache 2.0 (inherited from the base model)

danielchalef
/

Qwen3-Reranker-4B-seq-cls-vllm-fixed