deepseek-ai
/

Janus-Pro-7B

@@ -1,60 +1,60 @@
----
-license: mit
-license_name: deepseek
-license_link: LICENSE
-pipeline_tag: any-to-any
-library_name: transformers
-tags:
-- muiltimodal
-- text-to-image
-- unified-model
----
-## 1. Introduction
-Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
-It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
-Janus surpasses previous unified model and matches or exceeds the performance of task-specific models.
-The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models.
-[**Github Repository**](https://github.com/deepseek-ai/Janus)
-<div align="center">
-<img alt="image" src="janus_pro_teaser1.png" style="width:90%;">
-</div>
-<div align="center">
-<img alt="image" src="janus_pro_teaser2.png" style="width:90%;">
-</div>
-### 2. Model Summary
-Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
-Janus is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
-For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
-## 3. Quick Start
-Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
-## 4. License
-This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
-## 5. Citation
-```
-@misc{chen2025januspro,
-      title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
-      author={Xiaokang Chen and Zhiyu Wu and Xingchao Liu and Zizheng Pan and Wen Liu and Zhenda Xie and Xingkai Yu and Chong Ruan},
-      year={2025},
-}
-```
-## 6. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).

+---
+license: mit
+license_name: deepseek
+license_link: LICENSE
+pipeline_tag: any-to-any
+library_name: transformers
+tags:
+- muiltimodal
+- text-to-image
+- unified-model
+---
+## 1. Introduction
+Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.
+It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility.
+Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models.
+The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.
+[**Github Repository**](https://github.com/deepseek-ai/Janus)
+<div align="center">
+<img alt="image" src="janus_pro_teaser1.png" style="width:90%;">
+</div>
+<div align="center">
+<img alt="image" src="janus_pro_teaser2.png" style="width:90%;">
+</div>
+### 2. Model Summary
+Janus-Pro is a unified understanding and generation MLLM, which decouples visual encoding for multimodal understanding and generation.
+Janus-Pro is constructed based on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base.
+For multimodal understanding, it uses the [SigLIP-L](https://huggingface.co/timm/ViT-L-16-SigLIP-384) as the vision encoder, which supports 384 x 384 image input. For image generation, Janus-Pro uses the tokenizer from [here](https://github.com/FoundationVision/LlamaGen) with a downsample rate of 16.
+## 3. Quick Start
+Please refer to [**Github Repository**](https://github.com/deepseek-ai/Janus)
+## 4. License
+This code repository is licensed under [the MIT License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). The use of Janus-Pro models is subject to [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL).
+## 5. Citation
+```
+@misc{chen2025januspro,
+      title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
+      author={Xiaokang Chen and Zhiyu Wu and Xingchao Liu and Zizheng Pan and Wen Liu and Zhenda Xie and Xingkai Yu and Chong Ruan},
+      year={2025},
+}
+```
+## 6. Contact
 If you have any questions, please raise an issue or contact us at [[email protected]](mailto:[email protected]).