Model Card: ROYXAI [Vision Transformer + VGG19 + ResNet50 Ensemble with Grad-CAM]

Model Description

This model is an ensemble of three deep learning architectures: Vision Transformer (ViT), VGG19, and ResNet50. The ensemble approach enhances classification performance on medical image datasets related to ocular diseases. The model also integrates Grad-CAM visualization to highlight regions of interest for better interpretability.

Model Details

  • Model Name: ROYXAI
  • Developed by: Avishek Roy Sparsho
  • Framework: PyTorch
  • Ensemble Method: Bagging
  • Backbone Models: Vision Transformer, VGG19, ResNet50
  • Target Task: Medical Image Classification
  • Supported Classes:
    • OT
    • Healthy
    • SC_diabetes
    • SC_cataract
    • SC_glucoma

Model Sources

Uses

Direct Use

This model is designed for medical image classification to detect and Visualize ocular diseases and its secondary complications.

Downstream Use

Can be fine-tuned on different medical datasets to improve performance for specific conditions.

Out-of-Scope Use

Not suitable for non-medical image classification tasks or use as a standalone medical diagnostic tool.

Bias, Risks, and Limitations

  • This model is trained on a specific dataset and may not generalize well to other medical image datasets without fine-tuning.
  • It is not a substitute for professional medical diagnosis.
  • The Vision Transformer model is computationally expensive compared to CNNs.

Training Details

Dataset

  • Dataset Name: Custom Ocular Disease and its Secondary complications Dataset
  • Dataset Source: Private Dataset (Medical Images)
  • Dataset Structure: Images stored in folders based on class labels
  • Preprocessing:
    • Resized images to 224x224 pixels
    • Normalized using ImageNet mean and standard deviation

Training Procedure

  • Optimizer: Adam with weight decay
  • Learning Rate Scheduler: Cosine Annealing LR
  • Loss Function: Cross-Entropy Loss
  • Batch Size: 32
  • Training Epochs: 20
  • Hardware Used: T4 GPU x2

Model Performance

  • Accuracy: 98% on the test dataset
  • Precision/Recall/F1-score: Evaluated and optimized for medical diagnosis
  • Overfitting Prevention: Implemented data augmentation, dropout, weight regularization

Installation and Usage

Clone the Repository

git clone https://huggingface.co/Aviroy/ROYXAI
cd ROYXAI

Install Dependencies

pip install -r requirements.txt

Training the Model

To train the model from scratch, run:

python train.py --epochs 50 --batch_size 32

Load Pretrained Model

To directly use the trained model:

import torch
from PIL import Image
import torchvision.transforms as transforms
from model import ensemble_model  # Load the trained ensemble model

# Define image transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load and preprocess an image
image_path = "path/to/image.jpg"
image = Image.open(image_path).convert('RGB')
image = transform(image).unsqueeze(0).to('cuda' if torch.cuda.is_available() else 'cpu')

# Perform inference
ensemble_model.eval()
with torch.no_grad():
    output = ensemble_model(image)
    predicted_class = torch.argmax(output, dim=1).item()

# Print classification result
print("Predicted Class:", predicted_class)

Grad-CAM Visualization

Visualizing Attention Maps for Interpretability

Vision Transformer (ViT)

from visualization import visualize_gradcam_vit  # Function for ViT Grad-CAM

# Generate Grad-CAM visualization
overlay = visualize_gradcam_vit(ensemble_model.models[0], image, target_class=predicted_class)

# Display the Grad-CAM output
import matplotlib.pyplot as plt
plt.imshow(overlay)
plt.axis('off')
plt.title("Grad-CAM for Vision Transformer")
plt.show()

ResNet50

from visualization import visualize_gradcam  # General Grad-CAM function

# Generate Grad-CAM visualization for ResNet50
overlay = visualize_gradcam(ensemble_model.models[2], image, target_class=predicted_class)

# Display the Grad-CAM output
import matplotlib.pyplot as plt
plt.imshow(overlay)
plt.axis('off')
plt.title("Grad-CAM for ResNet50")
plt.show()

VGG19

from visualization import visualize_gradcam  # General Grad-CAM function

# Generate Grad-CAM visualization for VGG19
overlay = visualize_gradcam(ensemble_model.models[1], image, target_class=predicted_class)

# Display the Grad-CAM output
import matplotlib.pyplot as plt
plt.imshow(overlay)
plt.axis('off')
plt.title("Grad-CAM for VGG19")
plt.show()

Environmental Impact

  • Hardware Type: T4 GPU x2
  • Hours used: 50
  • Cloud Provider: Google Cloud (GCP)
  • Compute Region: US-Central1
  • Carbon Emitted: Estimated using Machine Learning Impact Calculator

Citation

If you use this model in your research, please cite:

Citation

If you use this model in your research, please cite:

@article{Sparsho2025,
  author    = {Avishek Roy Sparsho},
  title     = {ROYXAI Model For Proper Visualization of Classified Medical Image},
  journal   = {Medical AI Research},
  year      = {2025}
}

Acknowledgments

Special thanks to the open-source community and Kaggle for providing medical datasets for deep learning research.

Contact

For inquiries, please contact: Avishek Roy Sparsho

License

This model is released under the Apache 2.0 License. Use it responsibly.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Aviroy/ROYXAI

Finetuned
(632)
this model