Mahmud ElHuseyni π΅πΈ
MElHuseyni
AI & ML interests
Computer Vision
NLP
Machine Learning
Recent Activity
upvoted
a
paper
about 1 hour ago
OpenVision 2: A Family of Generative Pretrained Visual Encoders for
Multimodal Learning
upvoted
a
collection
about 5 hours ago
OpenVision 2
liked
a model
1 day ago
nvidia/NVIDIA-Nemotron-Nano-12B-v2
Organizations
SmolVLM π
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 79.5k β’ 540 -
OpenGVLab/InternVL3-1B
Image-Text-to-Text β’ 0.9B β’ Updated β’ 88.3k β’ 72 -
OpenGVLab/InternVL3-2B
Image-Text-to-Text β’ 2B β’ Updated β’ 50.1k β’ 36 -
LiquidAI/LFM2-VL-450M
Image-Text-to-Text β’ 0.5B β’ Updated β’ 7.35k β’ 110
OCR Models ποΈπ
Visual Embedding Models πΌοΈ
-
jinaai/jina-embeddings-v4
Visual Document Retrieval β’ 4B β’ Updated β’ 47.1k β’ 340 -
vidore/colqwen2.5-v0.2
Visual Document Retrieval β’ Updated β’ 90k β’ 70 -
nomic-ai/colnomic-embed-multimodal-7b
Visual Document Retrieval β’ Updated β’ 6.99k β’ 84 -
nvidia/llama-nemoretriever-colembed-3b-v1
Visual Document Retrieval β’ 4B β’ Updated β’ 2.26k β’ 41
Speech Models π§
-
ICTNLP/Llama-3.1-8B-Omni
9B β’ Updated β’ 232 β’ 410 -
AudioPaLM: A Large Language Model That Can Speak and Listen
Paper β’ 2306.12925 β’ Published β’ 54 -
fnlp/SpeechGPT-7B-cm
Text Generation β’ Updated β’ 133 β’ 7 -
parler-tts/parler_tts_mini_v0.1
Text-to-Speech β’ 0.6B β’ Updated β’ 8.22k β’ 357
Instance Segmentation
Image Segmentation Models πͺ
-
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation β’ Updated β’ 216k β’ β’ 30 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation β’ 0.0B β’ Updated β’ 199k β’ β’ 164 -
facebook/maskformer-swin-base-ade
Image Segmentation β’ Updated β’ 4.73k β’ β’ 13 -
facebook/maskformer-swin-base-coco
Image Segmentation β’ 0.1B β’ Updated β’ 1.27k β’ β’ 26
Object Detection Models π
Vision Language Leader-boards π
-
Running3434
OCRBenchv2 Leaderboard
πDisplay OCRBench leaderboard for text recognition models
-
Running170170
Vidore Leaderboard
π₯Explore visual document retrieval benchmark results
-
Running on CPU Upgrade874874
Open VLM Leaderboard
πVLMEvalKit Evaluation Results Collection
-
Running557557
Vision Arena (Testing VLMs side-by-side)
πΌDisplay image analysis results
LLM Inference π
-
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Paper β’ 2401.08671 β’ Published β’ 15 -
NanoFlow: Towards Optimal Large Language Model Serving Throughput
Paper β’ 2408.12757 β’ Published β’ 18 -
richard-park/llama3-deepspeed-v1.0
Text Generation β’ 8B β’ Updated β’ 424 β’ β’ 1
Arabic Models (LLM, VLM, Multimodel)
Instance Segmentation
SmolVLM π
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 79.5k β’ 540 -
OpenGVLab/InternVL3-1B
Image-Text-to-Text β’ 0.9B β’ Updated β’ 88.3k β’ 72 -
OpenGVLab/InternVL3-2B
Image-Text-to-Text β’ 2B β’ Updated β’ 50.1k β’ 36 -
LiquidAI/LFM2-VL-450M
Image-Text-to-Text β’ 0.5B β’ Updated β’ 7.35k β’ 110
Image Segmentation Models πͺ
-
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation β’ Updated β’ 216k β’ β’ 30 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation β’ 0.0B β’ Updated β’ 199k β’ β’ 164 -
facebook/maskformer-swin-base-ade
Image Segmentation β’ Updated β’ 4.73k β’ β’ 13 -
facebook/maskformer-swin-base-coco
Image Segmentation β’ 0.1B β’ Updated β’ 1.27k β’ β’ 26
OCR Models ποΈπ
Object Detection Models π
Visual Embedding Models πΌοΈ
-
jinaai/jina-embeddings-v4
Visual Document Retrieval β’ 4B β’ Updated β’ 47.1k β’ 340 -
vidore/colqwen2.5-v0.2
Visual Document Retrieval β’ Updated β’ 90k β’ 70 -
nomic-ai/colnomic-embed-multimodal-7b
Visual Document Retrieval β’ Updated β’ 6.99k β’ 84 -
nvidia/llama-nemoretriever-colembed-3b-v1
Visual Document Retrieval β’ 4B β’ Updated β’ 2.26k β’ 41
Vision Language Leader-boards π
-
Running3434
OCRBenchv2 Leaderboard
πDisplay OCRBench leaderboard for text recognition models
-
Running170170
Vidore Leaderboard
π₯Explore visual document retrieval benchmark results
-
Running on CPU Upgrade874874
Open VLM Leaderboard
πVLMEvalKit Evaluation Results Collection
-
Running557557
Vision Arena (Testing VLMs side-by-side)
πΌDisplay image analysis results
Speech Models π§
-
ICTNLP/Llama-3.1-8B-Omni
9B β’ Updated β’ 232 β’ 410 -
AudioPaLM: A Large Language Model That Can Speak and Listen
Paper β’ 2306.12925 β’ Published β’ 54 -
fnlp/SpeechGPT-7B-cm
Text Generation β’ Updated β’ 133 β’ 7 -
parler-tts/parler_tts_mini_v0.1
Text-to-Speech β’ 0.6B β’ Updated β’ 8.22k β’ 357
LLM Inference π
-
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Paper β’ 2401.08671 β’ Published β’ 15 -
NanoFlow: Towards Optimal Large Language Model Serving Throughput
Paper β’ 2408.12757 β’ Published β’ 18 -
richard-park/llama3-deepspeed-v1.0
Text Generation β’ 8B β’ Updated β’ 424 β’ β’ 1