Qwen3-Reranker-4B-seq-cls-vllm-fixed

This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM.

Model Description

This model is a pre-converted version of Qwen/Qwen3-Reranker-4B that:

  • Has been converted from CausalLM to SequenceClassification architecture
  • Includes proper configuration for vLLM compatibility
  • Provides ~75,000x reduction in classification head size
  • Offers ~150,000x fewer operations per token compared to using the full LM head

Key Improvements

The original converted model (tomaarsen/Qwen3-Reranker-4B-seq-cls) was missing critical vLLM configuration attributes. This version adds:

{
  "classifier_from_token": ["no", "yes"],
  "method": "from_2_way_softmax",
  "use_pad_token": false,
  "is_original_qwen3_reranker": false
}

These configurations are essential for vLLM to properly handle the pre-converted weights.

Usage with vLLM

vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \
    --task score \
    --served-model-name qwen3-reranker-4b \
    --disable-log-requests

Python Example

from vllm import LLM

llm = LLM(
    model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed",
    task="score"
)

queries = ["What is the capital of France?"]
documents = ["Paris is the capital of France."]

outputs = llm.score(queries, documents)
scores = [output.outputs.score for output in outputs]
print(scores)

Performance

This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements:

  • Memory: ~600MB โ†’ ~8KB for classification head
  • Compute: 151,936 logits โ†’ 1 logit per forward pass
  • Speed: Faster inference due to reduced computation

Technical Details

  • Architecture: Qwen3ForSequenceClassification
  • Base Model: Qwen/Qwen3-Reranker-4B
  • Conversion Method: from_2_way_softmax (yes_logit - no_logit)
  • Model Size: 4B parameters
  • Task: Reranking/Scoring

Citation

If you use this model, please cite the original Qwen3-Reranker:

@misc{qwen3reranker2024,
  title={Qwen3-Reranker},
  author={Qwen Team},
  year={2024},
  publisher={Hugging Face}
}

License

Apache 2.0 (inherited from the base model)

Downloads last month
345
Safetensors
Model size
4.02B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed

Base model

Qwen/Qwen3-4B-Base
Finetuned
(2)
this model