khaled123 commited on
Commit
6326c12
·
verified ·
1 Parent(s): b7bd840

Create README.md

Browse files

*Zenthia GPT: Achieving Efficient Training with Superior Convergence*

Introducing *Zenthia GPT*, a highly efficient model that saves 95% of the computation typically required to train GPT-2, while achieving better convergence with a validation loss of *2.86*.

*Key Features:*
- *MiniPile Dataset*: Trained on a compact dataset of just *1 billion tokens* from MiniPile, enabling quicker training cycles.
- *Optimized Resource Usage*: Utilizes the *Adam Mini Optimizer*, cutting VRAM usage by *50%*, allowing the model to train efficiently even on hardware with limited resources.
- *Context Size Adjustment*: The model operates with a *context size of 256*, optimized to fit within an *8GB VRAM*, as opposed to the standard 1024 context size. This context size reduction allows for better memory management, though it may limit performance on tasks that require longer context lengths.
- *Improved HellaSwag Score*: Despite the reduced context size, the model achieves a *HellaSwag score of 0.26*, highlighting its competitive performance.

*Trade-offs and Considerations*:
- The reduced context length may hinder performance on certain *HellaSwag benchmark* examples that require more than 256 tokens.
- *Potential Overfitting*: Due to limited exposure to a small, high-quality dataset, the model shows signs of possible overfitting. While the data quality is high, the relatively small dataset size could impact its generalization capabilities.

Overall, *Zenthia GPT* strikes an effective balance between computational efficiency and performance, pushing the boundaries of what's possible with constrained resources.

Files changed (1) hide show
  1. README.md +7 -0
README.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - JeanKaddour/minipile
5
+ language:
6
+ - en
7
+ ---