javi8979 commited on
Commit
6322785
·
verified ·
1 Parent(s): 41e507c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -355,7 +355,7 @@ Click the expand button below to see the full list of tasks included in the fine
355
 
356
  ## Evaluation
357
 
358
- The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 250 tokens). We report the following metrics:
359
 
360
  <details>
361
  <summary>Click to show metrics details</summary>
@@ -366,10 +366,23 @@ The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-
366
  - `Comet`: Model checkpoint: "Unbabel/wmt22-comet-da".
367
  - `Comet-kiwi`: Model checkpoint: "Unbabel/wmt22-cometkiwi-da".
368
  - `Bleurt`: Model checkpoint: "lucadiliello/BLEURT-20".
 
 
369
 
370
  </details>
371
 
372
-
 
 
 
 
 
 
 
 
 
 
 
373
 
374
  ## Ethical Considerations and Limitations
375
 
 
355
 
356
  ## Evaluation
357
 
358
+ Below are the evaluation results on the Flores+200 devtest set, compared against the state-of-the-art MADLAD400-7B model ([Kudugunta, S., et al.](https://arxiv.org/abs/2309.04662)). These results cover translation directions between CA-XX, ES-XX, EN-XX, as well as XX-CA, XX-ES, and XX-EN. The metrics have been computed excluding Asturian, Aranese, and Aragonese as we report them separately. The evaluation was conducted using [MT Lens](https://github.com/langtech-bsc/mt-evaluation) following the standard setting (beam search with beam size 5, limiting the translation length to 500 tokens). We report the following metrics:
359
 
360
  <details>
361
  <summary>Click to show metrics details</summary>
 
366
  - `Comet`: Model checkpoint: "Unbabel/wmt22-comet-da".
367
  - `Comet-kiwi`: Model checkpoint: "Unbabel/wmt22-cometkiwi-da".
368
  - `Bleurt`: Model checkpoint: "lucadiliello/BLEURT-20".
369
+ - `MetricX`: Model checkpoint: "google/metricx-23-xl-v2p0".
370
+ - `MetricX-QE`: Model checkpoint: "google/metricx-23-qe-xl-v2p0".
371
 
372
  </details>
373
 
374
+ ### English
375
+
376
+ | | Bleu ↑ | Ter ↓ | ChrF ↑ | Comet ↑ | Comet-kiwi ↑ | Bleurt ↑ | MetricX ↓ | MetricX-QE ↓ |
377
+ |:---------------------------------|-------:|------:|-------:|--------:|-------------:|---------:|----------:|-------------:|
378
+ | **EN-XX** | | | | | | | | |
379
+ | SalamandraTA-7b-instruct | **36.29** | **50.62** | 63.3 | **0.89** | **0.85** | **0.79** | **1.02** | **0.94** |
380
+ | MADLAD400-7B | 35.73 | 51.87 | **63.46** | 0.88 | **0.85** | **0.79** | 1.16 | 1.1 |
381
+ | SalamandraTA-7b-base | 34.99 | 52.64 | 62.58 | 0.87 | 0.84 | 0.77 | 1.45 | 1.23 |
382
+ | **XX-EN** | | | | | | | | |
383
+ | SalamandraTA-7b-instruct | **44.69** | **41.72** | 68.17 | **0.89** | **0.85** | **0.8** | **1.09** | **1.11** |
384
+ | SalamandraTA-7b-base | 44.12 | 43 | **68.43** | **0.89** | **0.85** | **0.8** | 1.13 | 1.22 |
385
+ | MADLAD400-7B | 43.2 | 43.33 | 67.98 | 0.89 | 0.86 | 0.8 | 1.13 | 1.15 |
386
 
387
  ## Ethical Considerations and Limitations
388