| license: mit | |
| language: | |
| - en | |
| pipeline_tag: feature-extraction | |
| tags: | |
| - sentiment-analysis | |
| - text-classification | |
| - generic | |
| - sentiment-classification | |
| datasets: | |
| - Numind/C4_sentiment-analysis | |
| ## Model | |
| The base version of [e5-v2](https://huggingface.co/intfloat/e5-base-v2) finetunned on an annotated subset of [C4](https://huggingface.co/datasets/Numind/C4_sentiment-analysis). This model provides generic embedding for sentiment analysis. Embeddings can be used out of the box or fine-tuned on specific datasets. | |
| Blog post: https://www.numind.ai/blog/creating-task-specific-foundation-models-with-gpt-4 | |
| ## Usage | |
| Below is an example to encode text and get embedding. | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModel | |
| model = AutoModel.from_pretrained("Numind/e5-base-sentiment_analysis") | |
| tokenizer = AutoTokenizer.from_pretrained("Numind/e5-base-sentiment_analysis") | |
| device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') | |
| model.to(device) | |
| size = 256 | |
| text = "This movie is amazing" | |
| encoding = tokenizer( | |
| text, | |
| truncation=True, | |
| padding='max_length', | |
| max_length= size, | |
| ) | |
| emb = model( | |
| torch.reshape(torch.tensor(encoding.input_ids),(1,len(encoding.input_ids))).to(device),output_hidden_states=True | |
| ).hidden_states[-1].cpu().detach() | |
| embText = torch.mean(emb,axis = 1) | |
| ``` |