Update README.md
Browse files
README.md
CHANGED
@@ -355,7 +355,7 @@ Click the expand button below to see the full list of tasks included in the fine
|
|
355 |
|
356 |
## Evaluation
|
357 |
|
358 |
-
The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to
|
359 |
|
360 |
<details>
|
361 |
<summary>Click to show metrics details</summary>
|
@@ -366,10 +366,23 @@ The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-
|
|
366 |
- `Comet`: Model checkpoint: "Unbabel/wmt22-comet-da".
|
367 |
- `Comet-kiwi`: Model checkpoint: "Unbabel/wmt22-cometkiwi-da".
|
368 |
- `Bleurt`: Model checkpoint: "lucadiliello/BLEURT-20".
|
|
|
|
|
369 |
|
370 |
</details>
|
371 |
|
372 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
373 |
|
374 |
## Ethical Considerations and Limitations
|
375 |
|
|
|
355 |
|
356 |
## Evaluation
|
357 |
|
358 |
+
Below are the evaluation results on the Flores+200 devtest set, compared against the state-of-the-art MADLAD400-7B model ([Kudugunta, S., et al.](https://arxiv.org/abs/2309.04662)). These results cover translation directions between CA-XX, ES-XX, EN-XX, as well as XX-CA, XX-ES, and XX-EN. The metrics have been computed excluding Asturian, Aranese, and Aragonese as we report them separately. The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 500 tokens). We report the following metrics:
|
359 |
|
360 |
<details>
|
361 |
<summary>Click to show metrics details</summary>
|
|
|
366 |
- `Comet`: Model checkpoint: "Unbabel/wmt22-comet-da".
|
367 |
- `Comet-kiwi`: Model checkpoint: "Unbabel/wmt22-cometkiwi-da".
|
368 |
- `Bleurt`: Model checkpoint: "lucadiliello/BLEURT-20".
|
369 |
+
- `MetricX`: Model checkpoint: "google/metricx-23-xl-v2p0".
|
370 |
+
- `MetricX-QE`: Model checkpoint: "google/metricx-23-qe-xl-v2p0".
|
371 |
|
372 |
</details>
|
373 |
|
374 |
+
### English
|
375 |
+
|
376 |
+
| | Bleu ↑ | Ter ↓ | ChrF ↑ | Comet ↑ | Comet-kiwi ↑ | Bleurt ↑ | MetricX ↓ | MetricX-QE ↓ |
|
377 |
+
|:---------------------------------|-------:|------:|-------:|--------:|-------------:|---------:|----------:|-------------:|
|
378 |
+
| **EN-XX** | | | | | | | | |
|
379 |
+
| SalamandraTA-7b-instruct | **36.29** | **50.62** | 63.3 | **0.89** | **0.85** | **0.79** | **1.02** | **0.94** |
|
380 |
+
| MADLAD400-7B | 35.73 | 51.87 | **63.46** | 0.88 | **0.85** | **0.79** | 1.16 | 1.1 |
|
381 |
+
| SalamandraTA-7b-base | 34.99 | 52.64 | 62.58 | 0.87 | 0.84 | 0.77 | 1.45 | 1.23 |
|
382 |
+
| **XX-EN** | | | | | | | | |
|
383 |
+
| SalamandraTA-7b-instruct | **44.69** | **41.72** | 68.17 | **0.89** | **0.85** | **0.8** | **1.09** | **1.11** |
|
384 |
+
| SalamandraTA-7b-base | 44.12 | 43 | **68.43** | **0.89** | **0.85** | **0.8** | 1.13 | 1.22 |
|
385 |
+
| MADLAD400-7B | 43.2 | 43.33 | 67.98 | 0.89 | 0.86 | 0.8 | 1.13 | 1.15 |
|
386 |
|
387 |
## Ethical Considerations and Limitations
|
388 |
|