File size: 9,779 Bytes
313bbea 71d3180 313bbea 71d3180 2f8ec38 71d3180 2f8ec38 d3e466f 71d3180 cfb7750 71d3180 cfb7750 71d3180 313bbea 71d3180 313bbea 71d3180 1cc6474 71d3180 313bbea 71d3180 cfb7750 71d3180 cfb7750 067b717 71d3180 067b717 71d3180 cfb7750 71d3180 cfb7750 067b717 71d3180 cfb7750 6c19590 cfb7750 6c19590 71d3180 cfb7750 71d3180 6c19590 cfb7750 067b717 cfb7750 067b717 cfb7750 067b717 cfb7750 067b717 cfb7750 6c19590 cfb7750 71d3180 2f8ec38 967b6d0 2f8ec38 cfb7750 af8b7ea cfb7750 71d3180 cfb7750 2f8ec38 71d3180 313bbea 71d3180 e2bbf2a 71d3180 2f8ec38 71d3180 313bbea 71d3180 2f8ec38 71d3180 313bbea 71d3180 e2bbf2a 71d3180 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
---
license: mit
base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned
tags:
- text-embeddings
- sentence-transformers
- llm2vec
- medical
- chest-xray
- radiology
- clinical-nlp
language:
- en
pipeline_tag: feature-extraction
library_name: transformers
---
# LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
LLM2Vec4CXR is optimized for chest X-ray report analysis and medical text understanding.
It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
## Model Description
LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
This design enhances semantic representation of chest X-ray reports, making the model robust across different reporting styles and effective even with domain-specific abbreviations.
It improves performance on clinical text similarity, retrieval, and interpretation tasks.
### Key Features
- **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
- **Pooling Mode**: Latent Attention (fine-tuned weights automatically loaded)
- **Bidirectional Processing**: Enabled for better context understanding
- **Medical Domain**: Specialized for chest X-ray report analysis
- **Max Length**: 512 tokens
- **Precision**: bfloat16
- **Automatic Loading**: Latent attention weights are automatically loaded from safetensors
- **Simple API**: Built-in methods for similarity computation and instruction-based encoding
## Training Details
### Training Data
- Fully fine-tuned on chest X-ray reports and medical text data
- Training focused on understanding pleural effusion status and other chest X-ray findings
### Training Configuration
- **Pooling Mode**: `latent_attention` (modified from base model)
- **Enable Bidirectional**: True
- **Max Length**: 512
- **Torch Dtype**: bfloat16
- **Full Fine-tuning**: All model weights were updated during training
## Usage
### Installation
```bash
# Install the LLM2Vec4CXR package directly from GitHub
pip install git+https://github.com/lukeingawesome/llm2vec4cxr.git
# Or clone and install in development mode
git clone https://github.com/lukeingawesome/llm2vec4cxr.git
cd llm2vec4cxr
pip install -e .
```
### Basic Usage
```python
import torch
from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
# Load the model - latent attention weights are automatically loaded!
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LLM2Vec.from_pretrained(
base_model_name_or_path='lukeingawesome/llm2vec4cxr',
pooling_mode="latent_attention",
max_length=512,
enable_bidirectional=True,
torch_dtype=torch.bfloat16,
use_safetensors=True,
).to(device).eval()
# Configure tokenizer
model.tokenizer.padding_side = 'left'
# Simple text encoding
report = "There is a small increase in the left-sided effusion. There continues to be volume loss at both bases."
embedding = model.encode_text([report])
# Multiple texts at once
reports = [
"No acute cardiopulmonary abnormality.",
"Small bilateral pleural effusions.",
"Large left pleural effusion with compressive atelectasis."
]
embeddings = model.encode_text(reports)
```
### Advanced Usage with Instructions and Similarity
```python
# For instruction-following tasks with separator
instruction = 'Determine the change or the status of the pleural effusion.'
report = 'There is a small increase in the left-sided effusion.'
query_text = instruction + '!@#$%^&*()' + report
# Compare against multiple options
candidates = [
'No pleural effusion',
'Pleural effusion present',
'Pleural effusion is worsening',
'Pleural effusion is improving'
]
# Get similarity scores using the built-in method
similarities = model.compute_similarities(query_text, candidates)
print(f"Similarities: {similarities}")
# For custom separator-based encoding
embeddings = model.encode_with_separator([query_text], separator='!@#$%^&*()')
```
**Note**: The model now includes convenient methods like `compute_similarities()` and `encode_with_separator()` that handle complex tokenization automatically.
### Quick Start Example
Here's a complete example showing the model's capabilities:
```python
import torch
from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LLM2Vec.from_pretrained(
base_model_name_or_path='lukeingawesome/llm2vec4cxr',
pooling_mode="latent_attention",
max_length=512,
enable_bidirectional=True,
torch_dtype=torch.bfloat16,
use_safetensors=True,
).to(device).eval()
# Configure tokenizer
model.tokenizer.padding_side = 'left'
# Medical text analysis
instruction = 'Determine the change or the status of the pleural effusion.'
report = 'There is a small increase in the left-sided effusion.'
query = instruction + '!@#$%^&*()' + report
# Compare with different diagnoses
options = [
'No pleural effusion',
'Pleural effusion is worsening',
'Pleural effusion is stable',
'Pleural effusion is improving'
]
# Get similarity scores
scores = model.compute_similarities(query, options)
best_match = options[torch.argmax(scores)]
print(f"Best match: {best_match} (score: {torch.max(scores):.4f})")
```
Or retrieving clinically similar reports:
```python
import torch
from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LLM2Vec.from_pretrained(
base_model_name_or_path='lukeingawesome/llm2vec4cxr',
pooling_mode="latent_attention",
max_length=512,
enable_bidirectional=True,
torch_dtype=torch.bfloat16,
use_safetensors=True,
).to(device).eval()
# Configure tokenizer
model.tokenizer.padding_side = 'left'
# Instruction for retrieval
instruction = 'Retrieve semantically similar sentences'
query_report = "There is a small LLLF PE with basal atelectasis."
query_text = instruction + '!@#$%^&*()' + query_report
# Candidate reports
candidate_reports = [
"No acute cardiopulmonary abnormality.",
"Small left pleural effusion is present.",
"Large right pleural effusion causing compressive atelectasis.",
"Heart size is normal with no evidence of pleural effusion.",
"There is left pleural effusion."
]
# Compute similarity scores
scores = model.compute_similarities(query_text, candidate_reports)
# Retrieve the most similar report
best_match = candidate_reports[torch.argmax(scores)]
print(f"Most similar report: {best_match} (score: {torch.max(scores):.4f})")
```
## API Reference
The model provides several convenient methods:
### Core Methods
- **`encode_text(texts)`**: Simple text encoding with automatic embed_mask handling
- **`encode_with_separator(texts, separator='!@#$%^&*()')`**: Encoding with instruction/content separation
- **`compute_similarities(query_text, candidate_texts)`**: One-line similarity computation
- **`from_pretrained(..., pooling_mode="latent_attention")`**: Automatic latent attention weight loading
📄 **Related Papers**:
- [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
*Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
## Evaluation
The model has been evaluated on chest X-ray report analysis tasks, particularly for:
- Text retrieval/encoder
- Medical text similarity comparison
- Clinical finding extraction
### Sample Performance
The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
In particular, **LLM2Vec4CXR** shows stronger performance in:
- Handling medical abbreviations and radiological terminology
- Capturing fine-grained semantic differences in chest X-ray reports
## Intended Use
### Primary Use Cases
- **Medical Text Embeddings**: Generate embeddings for chest X-ray reports
- **Clinical Text Similarity**: Compare medical texts for semantic similarity
- **Medical Information Retrieval**: Find relevant medical reports or findings
- **Clinical NLP Research**: Foundation model for medical text analysis
### Limitations
- Specialized for chest X-ray reports - may not generalize to other medical domains
- Requires careful preprocessing for optimal performance
- Should be used as part of a larger clinical decision support system, not for standalone diagnosis
## Technical Specifications
- **Model Type**: Bidirectional Language Model (LLM2Vec)
- **Architecture**: LlamaBiModel (modified Llama 3.2)
- **Parameters**: ~1B parameters
- **Input Length**: Up to 512 tokens
- **Output**: Dense embeddings
- **Precision**: bfloat16
## Citation
If you use this model in your research, please cite:
```bibtex
@article{ko2025exploring,
title={Exploring the Capabilities of LLM Encoders for Image--Text Retrieval in Chest X-rays},
author={Ko, Hanbin and Cho, Gihun and Baek, Inhyeok and Kim, Donguk and Koo, Joonbeom and Kim, Changi and Lee, Dongheon and Park, Chang Min},
journal={arXiv preprint arXiv:2509.15234},
year={2025}
}
```
A preprint of this model will be released soon.
## Acknowledgments
This model is built upon:
- [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders
- [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models
## License
This model is licensed under the MIT License.
|