BSC-LT
/

salamandraTA-7b-instruct

text-generation

text-generation-inference

Inference Endpoints

🇪🇺 Region: EU

Model card Files Files and versions Community

javi8979 commited on 21 days ago

Commit

6322785

·

verified ·

1 Parent(s): 41e507c

Update README.md

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -355,7 +355,7 @@ Click the expand button below to see the full list of tasks included in the fine
 ## Evaluation
-The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 250 tokens). We report the following metrics:
 <details>
 <summary>Click to show metrics details</summary>
@@ -366,10 +366,23 @@ The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-
 - `Comet`: Model checkpoint: "Unbabel/wmt22-comet-da".
 - `Comet-kiwi`: Model checkpoint: "Unbabel/wmt22-cometkiwi-da".
 - `Bleurt`: Model checkpoint: "lucadiliello/BLEURT-20".
 </details>
 ## Ethical Considerations and Limitations

 ## Evaluation
+Below are the evaluation results on the Flores+200 devtest set, compared against the state-of-the-art MADLAD400-7B model ([Kudugunta, S., et al.](https://arxiv.org/abs/2309.04662)). These results cover translation directions between CA-XX, ES-XX, EN-XX, as well as XX-CA, XX-ES, and XX-EN. The metrics have been computed excluding Asturian, Aranese, and Aragonese as we report them separately. The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 500 tokens). We report the following metrics:
 <details>
 <summary>Click to show metrics details</summary>
 - `Comet`: Model checkpoint: "Unbabel/wmt22-comet-da".
 - `Comet-kiwi`: Model checkpoint: "Unbabel/wmt22-cometkiwi-da".
 - `Bleurt`: Model checkpoint: "lucadiliello/BLEURT-20".
+- `MetricX`: Model checkpoint: "google/metricx-23-xl-v2p0".
+- `MetricX-QE`: Model checkpoint: "google/metricx-23-qe-xl-v2p0".
 </details>
+### English
+|                       |   Bleu ↑ |   Ter ↓ |   ChrF ↑ |   Comet ↑ |   Comet-kiwi ↑ |   Bleurt ↑ |   MetricX ↓ |   MetricX-QE ↓ |
+|:---------------------------------|-------:|------:|-------:|--------:|-------------:|---------:|----------:|-------------:|
+| **EN-XX** | | | | | | | | |
+| SalamandraTA-7b-instruct |  **36.29** | **50.62** |  63.3  |    **0.89** |         **0.85** |     **0.79** |      **1.02** |         **0.94** |
+| MADLAD400-7B                     |  35.73 | 51.87 |  **63.46** |    0.88 |         **0.85** |     **0.79** |      1.16 |         1.1  |
+| SalamandraTA-7b-base                   |  34.99 | 52.64 |  62.58 |    0.87 |         0.84 |     0.77 |      1.45 |         1.23 |
+| **XX-EN** | | | | | | | | |
+| SalamandraTA-7b-instruct |  **44.69** | **41.72** |  68.17 |    **0.89** |         **0.85** |      **0.8** |      **1.09** |         **1.11** |
+| SalamandraTA-7b-base                   |  44.12 | 43    |  **68.43** |    **0.89** |         **0.85** |      **0.8** |      1.13 |         1.22 |
+| MADLAD400-7B                     |  43.2  | 43.33 |  67.98 |    0.89 |         0.86 |      0.8 |      1.13 |         1.15 |
 ## Ethical Considerations and Limitations