Hippo-6B / README.md

Update README.md

1e3829e verified 9 months ago

4.48 kB

	---
	license: apache-2.0
	tags:
	- Drenel
	- Hippo
	- LLM
	- MultiLingual
	- Drenel/Hippo-6B
	base_model:
	- Drenel/Hippo-6B
	- Drenel/Hippo-6B
	library_name: transformers
	---

	## Model Details

	Hippo-6B is a cutting-edge, transformer-based language model designed to provide state-of-the-art performance across a wide range of natural language processing tasks. With 6.2 billion parameters, Hippo-6B strikes a balance between computational efficiency and high performance, making it a versatile model for various applications.

	Context Length: Supports up to 4K context length

	Publisher: Drenel

	## Key Features and Technologies

	### 1. Efficient Attention Mechanism

	- Flash Attention: Hippo-6B leverages flash attention techniques, including flash attention functions (`flash_attn_func` and `flash_attn_varlen_func`), to efficiently compute attention scores. This reduces the computational overhead and memory usage, enabling the model to handle longer context lengths without performance degradation.
	- Support for Window Size: The model includes conditional support for attention windows, allowing for flexible and scalable attention mechanisms based on the available hardware and task requirements.

	### 2. Rotary Embeddings

	- Rotary Position Embeddings: Hippo-6B employs rotary position embeddings (`RotaryEmbedding`) to encode positional information in a more continuous and differentiable manner, enhancing the model's ability to capture long-range dependencies.
	- Scaled Rotary Embeddings: Variations such as `SuScaledRotaryEmbedding` and `YarnScaledRotaryEmbedding` adapt the rotary embeddings to different scaling factors, providing finer control over the embedding space.

	### 3. RMS Norm

	- RMS Normalization: The model utilizes Root Mean Square (RMS) normalization layers (`RMSNorm`) to stabilize training and improve convergence. RMS normalization helps in maintaining consistent gradient flow across layers, leading to more efficient training dynamics.

	### 4. Modular and Scalable Design

	- Modular Attention Classes: Hippo-6B features a modular design with different attention classes (`Attention`, `FlashAttention2`, `SdpaAttention`). This modularity allows easy customization and scalability of the attention mechanisms based on specific use cases.
	- MLP Layers: The model incorporates Multi-Layer Perceptron (MLP) layers with gating mechanisms to enhance the model's expressive power. The `MLP` class includes techniques such as expert gating and intermediate projections for more sophisticated representations.

	### 5. Caching and Memory Efficiency

	- Dynamic Caching: The model supports dynamic caching strategies (`Cache`, `DynamicCache`) to optimize memory usage during inference, allowing for faster and more efficient processing of long sequences.

	### 6. Loss Functions

	- Cross-Entropy Loss: The model uses Cross-Entropy Loss for classification tasks, ensuring accurate and efficient learning of categorical distributions.
	- Mean Squared Error (MSE) Loss: For regression tasks, MSE Loss is employed to minimize the difference between predicted and actual values, providing robust performance in continuous prediction tasks.

	## Usage

	Hippo-6B can be used for a variety of NLP tasks, including but not limited to:

	- Text Generation
	- Language Translation
	- Sentiment Analysis
	- Named Entity Recognition
	- Text Classification

	### Chat Format

	You can provide the prompt as a question with a generic template as follow:
	```markdown
	<\|user\|>\nQuestion<\|end\|>\n<\|assistant\|>
	```

	## Example

	Here is a quick example of how to use Hippo-6B for text generation:

	```python
	# Libraries installation
	# pip install -q transformers accelerate flash-attn

	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

	torch.random.manual_seed(0)
	modelName = "Drenel/Hippo-6B"

	model = AutoModelForCausalLM.from_pretrained(modelName, device_map="cuda",torch_dtype="auto",trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained(modelName)

	messages = [
	{"role": "user", "content": "What is the capital of France? <\|end\|><\|assistant\|>"},
	]

	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
	generation_args = {"max_new_tokens": 50, "return_full_text": False, "temperature": 0.7, "do_sample": False, "top_k": 50, "top_p": 0.95}
	output = pipe(messages, **generation_args)
	print(output[0]['generated_text'])
	```


	## License

	Hippo-6B is distributed under the Apache-2.0.