lars1234
/

Mistral-Small-24B-Instruct-2501-writer

Model card Files Files and versions Community

lars1234 commited on 4 days ago

Commit

45850ca

·

verified ·

1 Parent(s): 5c496b4

Update README.md

Files changed (1) hide show

README.md +18 -5

README.md CHANGED Viewed

@@ -14,11 +14,24 @@ Mistral-Small-24B-Instruct-2501-writer is a fine-tuned version of `mistralai/Mis
 The following table was generated by creating 568 stories based on the same prompts as in the [lars1234/story_writing_benchmark](https://huggingface.co/datasets/lars1234/story_writing_benchmark) dataset and then evaluating them using the benchmark's evaluator models.
-| Model | Average | Grammar & Spelling | Clarity | Logical Connection | Scene Construction | Internal Consistency | Character Consistency | Character Motivation | Sentence Variety | Avoiding Clichés | Natural Dialogue | Avoiding Tropes | Character Depth | Character Interactions | Reader Interest | Plot Resolution |
-|-------|---------|-------------------|---------|-------------------|-------------------|---------------------|----------------------|---------------------|-----------------|----------------|-----------------|----------------|----------------|----------------------|----------------|-----------------|
-| Mistral-2501 | 49.3% | 82.1% | 63.0% | 57.7% | 56.1% | 67.2% | 50.7% | 44.6% | 57.7% | 24.6% | 42.9% | 28.6% | 35.7% | 45.0% | 54.1% | 35.3% |
-| Mistral-Writer | **56.5%** | 83.3% | 64.1% | 64.1% | 62.0% | 73.1% | 54.0% | **49.8%** | **64.4%** | **33.3%** | **51.9%** | 37.4% | **46.4%** | **52.0%** | **63.1%** | **45.3%** |
-| Gemma-Ataraxy | 56.1% | **88.8%** | **65.8%** | **66.0%** | **64.1%** | **75.1%** | **54.3%** | 49.2% | 64.0% | 31.2% | 48.3% | **40.0%** | 45.4% | 51.7% | 63.0% | 44.9% |
 Mistral-Small-24B-Instruct-2501-writer outperforms the base Mistral model across all metrics. Gemma-2-Ataraxy still shows higher creativity in some categories, as seen for example in its better score on "Avoiding Tropes."

 The following table was generated by creating 568 stories based on the same prompts as in the [lars1234/story_writing_benchmark](https://huggingface.co/datasets/lars1234/story_writing_benchmark) dataset and then evaluating them using the benchmark's evaluator models.
+| Metric | Mistral-2501 | Mistral-Writer | Gemma-Ataraxy |
+|-------|---------|-------------------|---------|
+| Grammar & Spelling | 82.1% | 83.3% | **88.8%** |
+| Clarity | 63.0% | 64.1% | **65.8%** |
+| Logical Connection | 57.7% | 64.1% | **66.0%** |
+| Scene Construction | 56.1% | 62.0% | **64.1%** |
+| Internal Consistency | 67.2% | 73.1% | **75.1%** |
+| Character Consistency | 50.7% | 54.0% | **54.3%** |
+| Character Motivation | 44.6% | **49.8%** | 49.2% |
+| Sentence Variety | 57.7% | **64.4%** | 64.0% |
+| Avoiding Clichés | 24.6% | **33.3%** | 31.2% |
+| Natural Dialogue | 42.9% | **51.9%** | 48.3% |
+| Avoiding Tropes | 28.6% | 37.4% | **40.0%** |
+| Character Depth | 35.7% | **46.4%** | 45.4% |
+| Character Interactions | 45.0% | **52.0%** | 51.7% |
+| Reader Interest | 54.1% | **63.1%** | 63.0% |
+| Plot Resolution | 35.3% | **45.3%** | 44.9% |
+| Average | 49.3% | **56.5%** | 56.1% |
 Mistral-Small-24B-Instruct-2501-writer outperforms the base Mistral model across all metrics. Gemma-2-Ataraxy still shows higher creativity in some categories, as seen for example in its better score on "Avoiding Tropes."