|
--- |
|
tags: |
|
- generated_from_triptuner |
|
- transformer |
|
- character-level |
|
- custom-model |
|
license: mit |
|
library_name: torch |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Triptuner Model |
|
|
|
This model is trained to generate itineraries for locations in Sri Lanka's Central Province. |
|
It uses a custom transformer-based language model designed to handle character-level sequences. |
|
|
|
## Usage |
|
|
|
The Triptuner model cannot be directly used with Hugging Face's built-in Inference API because it uses a custom architecture. Below are the instructions on how to manually load and use this model with PyTorch. |
|
|
|
### Load and Use the Model with PyTorch |
|
|
|
```python |
|
import torch |
|
|
|
# Define your custom model class |
|
class BigramLanguageModel(nn.Module): |
|
# Include the complete definition of your BigramLanguageModel here |
|
|
|
# Example method definitions: |
|
def __init__(self): |
|
super().__init__() |
|
# Define your model layers here as per the training setup |
|
# Example: |
|
# self.token_embedding_table = nn.Embedding(vocab_size, n_embd) |
|
# self.position_embedding_table = nn.Embedding(block_size, n_embd) |
|
# self.blocks = nn.Sequential(*[Block(n_embd, n_head=n_head) for _ in range(n_layer)]) |
|
# self.ln_f = nn.LayerNorm(n_embd) |
|
# self.lm_head = nn.Linear(n_embd, vocab_size) |
|
|
|
def forward(self, idx, targets=None): |
|
# Define the forward pass as per your model |
|
pass |
|
|
|
def generate(self, idx, max_new_tokens): |
|
# Implement the generate method for text generation |
|
pass |
|
|
|
# Load the model weights from Hugging Face |
|
model = BigramLanguageModel() |
|
model_url = "https://huggingface.co/yoonusajwardapiit/triptuner/resolve/main/pytorch_model.bin" |
|
model_weights = torch.hub.load_state_dict_from_url(model_url, map_location=torch.device('cpu'), weights_only=True) |
|
model.load_state_dict(model_weights) |
|
model.eval() |
|
|
|
# Define your character mappings |
|
chars = sorted(list(set("your_training_text_here"))) # Replace with the actual character set used in training |
|
stoi = {ch: i for i, ch in enumerate(chars)} |
|
itos = {i: ch for i, ch in enumerate(chars)} |
|
encode = lambda s: [stoi[c] for c in s] |
|
decode = lambda l: ''.join([itos[i] for i in l]) |
|
|
|
# Test the model with a sample prompt |
|
prompt = "Hanthana" # Replace with any relevant location or prompt |
|
context = torch.tensor([encode(prompt)], dtype=torch.long) |
|
|
|
# Generate text using the model |
|
with torch.no_grad(): |
|
generated = model.generate(context, max_new_tokens=250) # Adjust the number of new tokens as needed |
|
|
|
# Decode and print the generated text |
|
generated_text = decode(generated[0].tolist()) |
|
print(generated_text) |
|
|
|
|
|
## Training Data |
|
|
|
The model was trained on a dataset containing information about various locations in Sri Lanka's Central Province. |
|
|
|
## Model Architecture |
|
|
|
- Number of Layers: 4 |
|
- Embedding Size: 64 |
|
- Number of Heads: 4 |
|
- Context Length: 32 tokens |
|
|
|
## License |
|
|
|
MIT License |