danielchalef
/

fixed-qwen3-reranker-seq-cls

Text Classification

sequence-classification

Model card Files Files and versions

fixed-qwen3-reranker-seq-cls / README.md

danielchalef's picture

Upload folder using huggingface_hub

f81bf56 verified 3 months ago

|

history blame contribute delete

2.62 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- reranker
	- cross-encoder
	- sequence-classification
	- vllm
	base_model: Qwen/Qwen3-Reranker-4B
	pipeline_tag: text-classification
	---

	# Qwen3-Reranker-4B-seq-cls-vllm-fixed

	This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM.

	## Model Description

	This model is a pre-converted version of [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) that:
	- Has been converted from CausalLM to SequenceClassification architecture
	- Includes proper configuration for vLLM compatibility
	- Provides ~75,000x reduction in classification head size
	- Offers ~150,000x fewer operations per token compared to using the full LM head

	## Key Improvements

	The original converted model ([tomaarsen/Qwen3-Reranker-4B-seq-cls](https://huggingface.co/tomaarsen/Qwen3-Reranker-4B-seq-cls)) was missing critical vLLM configuration attributes. This version adds:

	```json
	{
	"classifier_from_token": ["no", "yes"],
	"method": "from_2_way_softmax",
	"use_pad_token": false,
	"is_original_qwen3_reranker": false
	}
	```

	These configurations are essential for vLLM to properly handle the pre-converted weights.

	## Usage with vLLM

	```bash
	vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \
	--task score \
	--served-model-name qwen3-reranker-4b \
	--disable-log-requests
	```

	### Python Example

	```python
	from vllm import LLM

	llm = LLM(
	model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed",
	task="score"
	)

	queries = ["What is the capital of France?"]
	documents = ["Paris is the capital of France."]

	outputs = llm.score(queries, documents)
	scores = [output.outputs.score for output in outputs]
	print(scores)
	```

	## Performance

	This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements:

	- Memory: ~600MB → ~8KB for classification head
	- Compute: 151,936 logits → 1 logit per forward pass
	- Speed: Faster inference due to reduced computation

	## Technical Details

	- Architecture: Qwen3ForSequenceClassification
	- Base Model: Qwen/Qwen3-Reranker-4B
	- Conversion Method: from_2_way_softmax (yes_logit - no_logit)
	- Model Size: 4B parameters
	- Task: Reranking/Scoring

	## Citation

	If you use this model, please cite the original Qwen3-Reranker:

	```bibtex
	@misc{qwen3reranker2024,
	title={Qwen3-Reranker},
	author={Qwen Team},
	year={2024},
	publisher={Hugging Face}
	}
	```

	## License

	Apache 2.0 (inherited from the base model)