Update README.md
Browse files
README.md
CHANGED
@@ -14,11 +14,24 @@ Mistral-Small-24B-Instruct-2501-writer is a fine-tuned version of `mistralai/Mis
|
|
14 |
|
15 |
The following table was generated by creating 568 stories based on the same prompts as in the [lars1234/story_writing_benchmark](https://huggingface.co/datasets/lars1234/story_writing_benchmark) dataset and then evaluating them using the benchmark's evaluator models.
|
16 |
|
17 |
-
|
|
18 |
-
|
19 |
-
|
|
20 |
-
|
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
Mistral-Small-24B-Instruct-2501-writer outperforms the base Mistral model across all metrics. Gemma-2-Ataraxy still shows higher creativity in some categories, as seen for example in its better score on "Avoiding Tropes."
|
24 |
|
|
|
14 |
|
15 |
The following table was generated by creating 568 stories based on the same prompts as in the [lars1234/story_writing_benchmark](https://huggingface.co/datasets/lars1234/story_writing_benchmark) dataset and then evaluating them using the benchmark's evaluator models.
|
16 |
|
17 |
+
| Metric | Mistral-2501 | Mistral-Writer | Gemma-Ataraxy |
|
18 |
+
|-------|---------|-------------------|---------|
|
19 |
+
| Grammar & Spelling | 82.1% | 83.3% | **88.8%** |
|
20 |
+
| Clarity | 63.0% | 64.1% | **65.8%** |
|
21 |
+
| Logical Connection | 57.7% | 64.1% | **66.0%** |
|
22 |
+
| Scene Construction | 56.1% | 62.0% | **64.1%** |
|
23 |
+
| Internal Consistency | 67.2% | 73.1% | **75.1%** |
|
24 |
+
| Character Consistency | 50.7% | 54.0% | **54.3%** |
|
25 |
+
| Character Motivation | 44.6% | **49.8%** | 49.2% |
|
26 |
+
| Sentence Variety | 57.7% | **64.4%** | 64.0% |
|
27 |
+
| Avoiding Clichés | 24.6% | **33.3%** | 31.2% |
|
28 |
+
| Natural Dialogue | 42.9% | **51.9%** | 48.3% |
|
29 |
+
| Avoiding Tropes | 28.6% | 37.4% | **40.0%** |
|
30 |
+
| Character Depth | 35.7% | **46.4%** | 45.4% |
|
31 |
+
| Character Interactions | 45.0% | **52.0%** | 51.7% |
|
32 |
+
| Reader Interest | 54.1% | **63.1%** | 63.0% |
|
33 |
+
| Plot Resolution | 35.3% | **45.3%** | 44.9% |
|
34 |
+
| Average | 49.3% | **56.5%** | 56.1% |
|
35 |
|
36 |
Mistral-Small-24B-Instruct-2501-writer outperforms the base Mistral model across all metrics. Gemma-2-Ataraxy still shows higher creativity in some categories, as seen for example in its better score on "Avoiding Tropes."
|
37 |
|