RangiLyu commited on
Commit
12e6d31
Β·
verified Β·
1 Parent(s): af13f02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -36,11 +36,32 @@ Built upon a 8B dense language model (Qwen3) and a 400M Vision encoder (InternVi
36
 
37
  ## Performance
38
 
39
- We evaluate the Intern-S1-mini on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
 
42
- > **Note**: βœ… means the best performance among open-sourced models, πŸ‘‘ indicates the best performance among all models.
43
-
44
  We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
45
 
46
 
 
36
 
37
  ## Performance
38
 
39
+ We evaluate the Intern-S1-mini on various benchmarks including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.
40
+
41
+
42
+ | | | Intern-S1-mini | Qwen3-8B | GLM-4.1V | MiMo-VL-7B-RL-2508 |
43
+ |------------|----------------|-------------------|----------|----------|--------------------|
44
+ | General | MMLU-Pro | **74.78** | 73.7 | 57.1 | 73.93 |
45
+ | γ€€ | MMMU | **72.33** | N/A | 69.9 | 70.4 |
46
+ | γ€€ | MMStar | 65.2 | N/A | 71.5 | 72.9 |
47
+ | γ€€ | GPQA | **65.15** | 62 | 50.32 | 60.35 |
48
+ | γ€€ | AIME2024 | **84.58** | 76 | 36.2 | 72.6 |
49
+ | γ€€ | AIME2025 | **80** | 67.3 | 32 | 64.4 |
50
+ | γ€€ | MathVision | 51.41 | N/A | 53.9 | 54.5 |
51
+ | γ€€ | MathVista | 70.3 | N/A | 80.7 | 79.4 |
52
+ | γ€€ | IFEval | 81.15 | 85 | 71.53 | 71.4 |
53
+ | | | | | | |
54
+ | Scientific | SFE | 35.84 | N/A | 43.2 | 43.9 |
55
+ | γ€€ | Physics | **28.76** | N/A | 4.3 | 23.9 |
56
+ | γ€€ | SmolInstruct | **32.2** | 17.6 | 18.1 | 16.11 |
57
+ | γ€€ | ChemBench | **76.47** | 61.1 | 56.2 | 66.78 |
58
+ | γ€€ | MatBench | **61.55** | 45.24 | 54.3 | 46.9 |
59
+ | γ€€ | MicroVQA | **56.62** | N/A | 50.2 | 50.96 |
60
+ | γ€€ | ProteinLMBench | 58.47 | 59.1 | 58.3 | 59.8 |
61
+ | γ€€ | MSEarthMCQ | **58.12** | N/A | 50.3 | 47.3 |
62
+ | γ€€ | XLRS-Bench | **51.63** | N/A | 49.8 | 12.29 |
63
 
64
 
 
 
65
  We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
66
 
67