nielsr HF staff commited on
Commit
f739f3f
·
verified ·
1 Parent(s): aa20713

Add pipeline tag and paper page

Browse files

This PR adds a pipeline tag and link to paper page, ensuring people can find your model at https://huggingface.co/models?pipeline_tag=any-to-any and https://huggingface.co/papers/2501.15368.

Files changed (1) hide show
  1. README.md +28 -217
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
4
  <div align="center">
5
 
6
  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/logo.png" width="300em" ></img>
@@ -14,7 +16,7 @@ license: apache-2.0
14
 
15
 
16
  <p align="center">
17
- Baichuan-Omni-1.5 <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5">🤗</a> | Baichuan-Omni-1.5-Base <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5-Base">🤗</a> |Github <a href="https://github.com/baichuan-inc/Baichuan-Omni-1.5/">📖 </a> | Report <a href="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/baichuan_omni_1_5.pdf">📖</a>
18
  </p>
19
  </p>
20
  <p align="center">
@@ -230,12 +232,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
230
 
231
  </details>
232
 
233
-
234
  <details>
235
 
236
- <summary>click to view</summary>
237
 
238
- #### Image Understanding
239
 
240
  <div align="center">
241
  <table style="margin: 0 auto; text-align: center;">
@@ -248,11 +249,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
248
  <tr>
249
  <td>Model</td>
250
  <td>Size</td>
251
- <td>MMBench-EN (Acc.)</td>
252
- <td>MMbench-CN (Acc.)</td>
253
- <td>SEED-IMG (Acc.)</td>
254
- <td>MMMU-val (Acc.)</td>
255
- <td>HallusionBench (Acc.)</td>
256
  </tr>
257
  <tr>
258
  <td colspan="9">Proprietary Models</td>
@@ -281,7 +282,7 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
281
  <tr>
282
  <td>Qwen2-VL-7B</td>
283
  <td>7B</td>
284
- <td><b>86.4<br></td>
285
  <td>81.9</td>
286
  <td><b>76.5<br></td>
287
  <td>52.7</td>
@@ -362,11 +363,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
362
  <tr>
363
  <td>Model</td>
364
  <td>Size</td>
365
- <td>RealWorldQA (Acc.)</td>
366
- <td>MathVista-mini (Acc.)</td>
367
- <td>TextVQA-val (Acc.)</td>
368
- <td>ChartQA (Acc.)</td>
369
- <td>OCRBench (Acc.)</td>
370
  </tr>
371
  <tr>
372
  <td colspan="8">Proprietary Models</td>
@@ -467,9 +468,9 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
467
 
468
  <details>
469
 
470
- <summary>click to view</summary>
471
 
472
- #### Video Understanding
473
  <div align="center">
474
  <table style="margin: 0 auto; text-align: center;">
475
  <thead>
@@ -482,10 +483,10 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
482
  <td>Model</td>
483
  <td>Size</td>
484
  <td># Frames</td>
485
- <td>MVBench (Acc.)</td>
486
- <td>Egoschema (Acc.)</td>
487
- <td>VideoMME (Acc.)</td>
488
- <td>Perception-Test (Acc.)</td>
489
  </tr>
490
  <tr>
491
  <td colspan="7">Proprietary Models</td>
@@ -607,7 +608,7 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
607
  <tr>
608
  <td>Baichuan-Omni</td>
609
  <td>7B</td>
610
- <td>1 fps (max 32)</td>
611
  <td>60.9</td>
612
  <td>58.8</td>
613
  <td>58.2</td>
@@ -635,6 +636,7 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
635
  </table>
636
  </div>
637
 
 
638
  <br>
639
 
640
  <div align="center">
@@ -799,12 +801,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
799
 
800
  </details>
801
 
802
-
803
  <details>
804
 
805
- <summary>click to view</summary>
806
 
807
- #### Audio Comprehensive and Speech Generation
808
  <div align="center">
809
  <table style="margin: 0 auto; text-align: center;">
810
  <thead>
@@ -915,201 +916,11 @@ We sugguest readers to refer to our [**Github**](https://github.com/baichuan-inc
915
  </tbody>
916
  </table>
917
  </div>
918
-
919
-
920
- </details>
921
-
922
-
923
-
924
- <details>
925
-
926
- <summary>click to view</summary>
927
-
928
- #### Omni-modal Understanding
929
-
930
- <div align="center">
931
- <table style="margin: 0 auto; text-align: center;">
932
- <thead>
933
- <tr>
934
- <th colspan="7">Omni-Undesratnding </th>
935
- </tr>
936
- <thead>
937
- <tbody>
938
- <tr>
939
- <td>Model</td>
940
- <td>Size</td>
941
- <td>Image & Audio</td>
942
- <td>Image Caption & Audio</td>
943
- <td>Image & Audio Transcript</td>
944
- <td>Image Caption & Audio Transcript</td>
945
- </tr>
946
- </thead>
947
- <tr>
948
- <td colspan="6">Proprietary Models</td>
949
- </tr>
950
- <tr>
951
- <td>GPT4o-mini</td>
952
- <td>-</td>
953
- <td>-</td>
954
- <td>-</td>
955
- <td>37.0</td>
956
- <td>37.7</td>
957
- </tr>
958
- <tr>
959
- <td colspan="6">Open-source Models (Omni-modal)</td>
960
- </tr>
961
- <tr>
962
- <td>VITA</td>
963
- <td>8x7B</td>
964
- <td>33.1</td>
965
- <td>31.8</td>
966
- <td>42.0</td>
967
- <td>44.2</td>
968
- </tr>
969
- <tr>
970
- <td>VITA-1.5</td>
971
- <td>7B</td>
972
- <td>33.4</td>
973
- <td>29.6</td>
974
- <td>48.5</td>
975
- <td><b>47.2<br></td>
976
- </tr>
977
- <tr>
978
- <td>Baichuan-Omni</td>
979
- <td>7B</td>
980
- <td>32.2</td>
981
- <td>26.5</td>
982
- <td>42.6</td>
983
- <td>44.2</td>
984
- </tr>
985
- <tr>
986
- <td>MiniCPM-o 2.6</td>
987
- <td>7B</td>
988
- <td>40.5</td>
989
- <td>30.8</td>
990
- <td><b>53.2<br></td>
991
- <td>46.3</td>
992
- </tr>
993
- <tr>
994
- <td><b>Baichuan-Omni-1.5<br></td>
995
- <td>7B</td>
996
- <td><b>42.9<br></td>
997
- <td><b>37.7<br></td>
998
- <td>47.9</td>
999
- <td>46.9</td>
1000
- </tr>
1001
- </tbody>
1002
- </table>
1003
- </div>
1004
-
1005
  </details>
1006
 
1007
- <details>
1008
-
1009
- <summary>click to view</summary>
1010
-
1011
- #### Medical Image Understanding Capabilities
1012
-
1013
- <div align="center">
1014
- <table style="margin: 0 auto; text-align: center;">
1015
- <thead>
1016
- <tr>
1017
- <th colspan="7">Medical Understanding&nbsp;&nbsp;&nbsp;</th>
1018
- </tr>
1019
- </thead>
1020
- <tbody>
1021
- <tr>
1022
- <td>Model</td>
1023
- <td>Size</td>
1024
- <td>GMAI-MMB-VAL (Acc.)</td>
1025
- <td>OpenMM-Medical (Acc.)</td>
1026
- </tr>
1027
- </thead>
1028
- <tr>
1029
- <td colspan="4">Proprietary Models</td>
1030
- </tr>
1031
- <tr>
1032
- <td>GPT4o-mini</td>
1033
- <td>-</td>
1034
- <td>46.4</td>
1035
- <td>74.3</td>
1036
- </tr>
1037
- <tr>
1038
- <td colspan="4">Open-source Models (Vision-Language)</td>
1039
- </tr>
1040
- <tr>
1041
- <td>Qwen2 VL</td>
1042
- <td>7B</td>
1043
- <td>46.3</td>
1044
- <td>76.9</td>
1045
- </tr>
1046
- <tr>
1047
- <td>Qwen2 VL</td>
1048
- <td>72B</td>
1049
- <td><b>50.7<br></td>
1050
- <td>80.7</td>
1051
- </tr>
1052
- <tr>
1053
- <td colspan="4">Open-source Models (Omni-modal)</td>
1054
- </tr>
1055
- <tr>
1056
- <td>VITA-1.5</td>
1057
- <td>7B</td>
1058
- <td>36.7</td>
1059
- <td>67.1</td>
1060
- </tr>
1061
- <tr>
1062
- <td>MiniCPM-o 2.6</td>
1063
- <td>7B</td>
1064
- <td>41.5</td>
1065
- <td>73.6</td>
1066
- </tr>
1067
- <tr>
1068
- <td><b>Baichuan-Omni-1.5<br></td>
1069
- <td>7B</td>
1070
- <td>49.9</td>
1071
- <td><b>83.8<br></td>
1072
- </tr>
1073
- </tbody>
1074
- </table>
1075
- </div>
1076
-
1077
- </details>
1078
-
1079
- ## Examples
1080
  <br>
1081
 
1082
  <div style="display: flex; flex-direction: column; align-items: center;">
1083
  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/pipeline.png" alt="pipeline" style="margin-bottom: 5px;">
1084
- <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/math.png" alt="math" style="margin-bottom: 5px;">
1085
- <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/fly_bill.png" alt="fly_bill" style="margin-bottom: 5px;">
1086
- </div>
1087
-
1088
-
1089
- ## 🚀 Quick Start
1090
- We recommend interested scholars to visit our github repo for more details. [**Github**](https://github.com/baichuan-inc/Baichuan-Omni-1.5/)
1091
-
1092
-
1093
- ### Statement
1094
- - We hereby declare that our team has not developed any applications based on Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models, not on iOS, Android, the web, or any other platform. We strongly call on all users not to use Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models for any activities that harm national / social security or violate the law. Also, we ask users not to use Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models for Internet services that have not undergone appropriate security reviews and filings. We hope that all users can abide by this principle and ensure that the development of technology proceeds in a regulated and legal environment.
1095
-
1096
- - We have done our best to ensure the compliance of the data used in the model training process. However, despite our considerable efforts, there may still be some unforeseeable issues due to the complexity of the model and data. Therefore, if any problems arise due to the use of Baichuan-Omni-1.5/Baichuan-Omni-1.5-base open-source models, including but not limited to data security issues, public opinion risks, or any risks and problems brought about by the model being misled, abused, spread or improperly exploited, we will not assume any responsibility.
1097
-
1098
-
1099
-
1100
- ### License
1101
- The community usage of Baichuan-Omni-1.5/Baichuan-Omni-1.5-base requires adherence to [Apache 2.0](https://github.com/baichuan-inc/Baichuan-Omni-1.5/blob/main/LICENSE) and [Community License for Baichuan-Omni-1.5 Models](https://github.com/baichuan-inc/Baichuan-Omni-1.5/blob/main/LICENSE). The Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models supports commercial use. If you plan to use the Baichuan-Omni-1.5/Baichuan-Omni-1.5-base models or its derivatives for commercial purposes, please ensure that your entity meets the following conditions:
1102
-
1103
- 1. The Daily Active Users (DAU) of your or your affiliate's service or product is less than 1 million.
1104
- 2. Neither you nor your affiliates are software service providers or cloud service providers.
1105
- 3. There is no possibility for you or your affiliates to grant the commercial license given to you, to reauthorize it to other third parties without Baichuan's permission.
1106
-
1107
- Upon meeting the above conditions, you need to submit the application materials required by the Baichuan-Omni-1.5 Model Community License Agreement via the following contact email: opensource@baichuan-inc.com. Once approved, Baichuan will hereby grant you a non-exclusive, global, non-transferable, non-sublicensable, revocable commercial copyright license.
1108
-
1109
- <!-- ### Citation
1110
-
1111
- If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️!
1112
- ```bib
1113
- @article{
1114
- } -->
1115
- ```
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: any-to-any
4
  ---
5
+
6
  <div align="center">
7
 
8
  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/logo.png" width="300em" ></img>
 
16
 
17
 
18
  <p align="center">
19
+ Baichuan-Omni-1.5 <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5">🤗</a> | Baichuan-Omni-1.5-Base <a href="https://huggingface.co/baichuan-inc/Baichuan-Omni-1d5-Base">🤗</a> |Github <a href="https://github.com/baichuan-inc/Baichuan-Omni-1.5/">📖 </a> | Report <a href="https://huggingface.co/papers/2501.15368">📖</a>
20
  </p>
21
  </p>
22
  <p align="center">
 
232
 
233
  </details>
234
 
 
235
  <details>
236
 
237
+ <summary>Click here to view detailed evaluation results of image understanding ability.</summary>
238
 
239
+ #### Image understanding ability
240
 
241
  <div align="center">
242
  <table style="margin: 0 auto; text-align: center;">
 
249
  <tr>
250
  <td>Model</td>
251
  <td>Size</td>
252
+ <td>MMBench-EN <br>(Acc.)</td>
253
+ <td>MMbench-CN <br>(Acc.)</td>
254
+ <td>SEED-IMG <br>(Acc.)</td>
255
+ <td>MMMU-val <br>(Acc.)</td>
256
+ <td>HallusionBench <br>(Acc.)</td>
257
  </tr>
258
  <tr>
259
  <td colspan="9">Proprietary Models</td>
 
282
  <tr>
283
  <td>Qwen2-VL-7B</td>
284
  <td>7B</td>
285
+ <td><b>86.4<br></td>
286
  <td>81.9</td>
287
  <td><b>76.5<br></td>
288
  <td>52.7</td>
 
363
  <tr>
364
  <td>Model</td>
365
  <td>Size</td>
366
+ <td>RealWorldQA <br>(Acc.)</td>
367
+ <td>MathVista-mini <br>(Acc.)</td>
368
+ <td>TextVQA-val <br>(Acc.)</td>
369
+ <td>ChartQA <br>(Acc.)</td>
370
+ <td>OCRBench <br>(Acc.)</td>
371
  </tr>
372
  <tr>
373
  <td colspan="8">Proprietary Models</td>
 
468
 
469
  <details>
470
 
471
+ <summary>Click here to view detailed evaluation results of video understanding ability.</summary>
472
 
473
+ #### Video understanding ability
474
  <div align="center">
475
  <table style="margin: 0 auto; text-align: center;">
476
  <thead>
 
483
  <td>Model</td>
484
  <td>Size</td>
485
  <td># Frames</td>
486
+ <td>MVBench <br>(Acc.)</td>
487
+ <td>Egoschema <br>(Acc.)</td>
488
+ <td>VideoMME <br>(Acc.)</td>
489
+ <td>Perception-Test <br>(Acc.)</td>
490
  </tr>
491
  <tr>
492
  <td colspan="7">Proprietary Models</td>
 
608
  <tr>
609
  <td>Baichuan-Omni</td>
610
  <td>7B</td>
611
+ <td>1 fps (max 48)</td>
612
  <td>60.9</td>
613
  <td>58.8</td>
614
  <td>58.2</td>
 
636
  </table>
637
  </div>
638
 
639
+
640
  <br>
641
 
642
  <div align="center">
 
801
 
802
  </details>
803
 
 
804
  <details>
805
 
806
+ <summary>Click here to view detailed evaluation results of audio understanding and generation ability.</summary>
807
 
808
+ #### Audio understanding and generation ability
809
  <div align="center">
810
  <table style="margin: 0 auto; text-align: center;">
811
  <thead>
 
916
  </tbody>
917
  </table>
918
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
919
  </details>
920
 
921
+ ### Examples
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
922
  <br>
923
 
924
  <div style="display: flex; flex-direction: column; align-items: center;">
925
  <img src="https://github.com/baichuan-inc/Baichuan-Omni-1.5/raw/main/assets/pipeline.png" alt="pipeline" style="margin-bottom: 5px;">
926
+ <img src="https://github.com/baichuan