Improve model card: title, metadata, project page
Browse filesThis PR addresses several improvements for the model card:
- Corrects the main title to "Step-Audio 2 Technical Report".
- Adds `pipeline_tag: any-to-any` to the metadata, ensuring the model is discoverable under this modality.
- Includes `library_name: transformers` in the metadata, enabling direct integration and "how to use" widgets on the Hub.
- Adds clear, explicit links to the paper, official project page, and code repository for better visibility and user experience.
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
<div align="center">
|
|
@@ -21,6 +23,12 @@ license: apache-2.0
|
|
| 21 |
<a href="https://github.com/stepfun-ai/Step-Audio2/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue?&color=blue"/></a>
|
| 22 |
</div>
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Introduction
|
| 25 |
|
| 26 |
|
|
@@ -133,7 +141,7 @@ CER for Chinese, Cantonese and Japanese and WER for Arabian and English. N/A ind
|
|
| 133 |
<td align="center"><strong>2.71</strong></td>
|
| 134 |
<td align="center">4.47</td>
|
| 135 |
<td align="center">5.05</td>
|
| 136 |
-
<
|
| 137 |
<td align="center">3.05</td>
|
| 138 |
</tr>
|
| 139 |
<tr>
|
|
@@ -190,6 +198,7 @@ CER for Chinese, Cantonese and Japanese and WER for Arabian and English. N/A ind
|
|
| 190 |
<td align="center">7.01</td>
|
| 191 |
<td align="center">2.68</td>
|
| 192 |
<td align="center"><strong>2.53</strong></td>
|
|
|
|
| 193 |
</tr>
|
| 194 |
<tr>
|
| 195 |
<td align="left">KeSpeech phase1</td>
|
|
@@ -854,3 +863,7 @@ The model and code in the repository is licensed under [Apache 2.0](LICENSE) Lic
|
|
| 854 |
url={https://arxiv.org/abs/2507.16632},
|
| 855 |
}
|
| 856 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: any-to-any
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
|
| 7 |
<div align="center">
|
|
|
|
| 23 |
<a href="https://github.com/stepfun-ai/Step-Audio2/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue?&color=blue"/></a>
|
| 24 |
</div>
|
| 25 |
|
| 26 |
+
# Step-Audio 2 Technical Report
|
| 27 |
+
|
| 28 |
+
**Paper**: [Step-Audio 2 Technical Report](https://arxiv.org/abs/2507.16632)
|
| 29 |
+
**Project Page**: [Step-Audio 2 Documentation](https://www.stepfun.com/docs/en/step-audio2)
|
| 30 |
+
**Code**: [GitHub Repository](https://github.com/stepfun-ai/Step-Audio2)
|
| 31 |
+
|
| 32 |
## Introduction
|
| 33 |
|
| 34 |
|
|
|
|
| 141 |
<td align="center"><strong>2.71</strong></td>
|
| 142 |
<td align="center">4.47</td>
|
| 143 |
<td align="center">5.05</td>
|
| 144 |
+
<align="center">3.03</align>
|
| 145 |
<td align="center">3.05</td>
|
| 146 |
</tr>
|
| 147 |
<tr>
|
|
|
|
| 198 |
<td align="center">7.01</td>
|
| 199 |
<td align="center">2.68</td>
|
| 200 |
<td align="center"><strong>2.53</strong></td>
|
| 201 |
+
<td align="center">2.53</td>
|
| 202 |
</tr>
|
| 203 |
<tr>
|
| 204 |
<td align="left">KeSpeech phase1</td>
|
|
|
|
| 863 |
url={https://arxiv.org/abs/2507.16632},
|
| 864 |
}
|
| 865 |
```
|
| 866 |
+
|
| 867 |
+
## Star History
|
| 868 |
+
|
| 869 |
+
[](https://star-history.com/#stepfun-ai/Step-Audio2&Date)
|