![](https://cdn-avatars.huggingface.co/v1/production/uploads/6574a1ef177b3b4663360bc3/UN5VvyAtdMUBCCyoO8lQO.jpeg)
PoC Embeddings
non-profit
AI & ML interests
Embeddings
Organization Card
Description
Our goal was to create a Proof of Concept (PoC) solution for matching messages from Telegram marketplaces.
There are two models that we developed:
- RoSBERTa-hermes-ru: Trained for location recognition, categories labeling, and inside-outside location classification.
- rubert-tiny-separater: Trained for supply and demand classification.
Architecture and Pretraining
RoSBERTa-hermes-ru
RoSBERTa is based on ai-forever/ru-en-RoSBERTa with multiple heads for downstream tasks:
- Backbone: Fully unfrozen, with the NER head fine-tuned for location recognition.
- Allocator head: Trained to determine whether or not a message contains the actual location of the user.
- Tags head with 1 layer of adapter: Trained to mark messages with different categories describing the message's context, such as tools, medicine, clothing, and more.
rubert-tiny-separater
Rubert is based on sergeyzh/rubert-tiny-turbo with a linear layer on top. The whole model was trained for classifying message types from Telegram marketplaces.
Labels:
- Supply: Somebody willing to sell something or provide a service.
- Demand: Somebody wants to buy something or hire someone.
- Noise: Messages unrelated to the topic.
Supported Languages
Russian, with English included.
models
None public yet
datasets
None public yet