PoC Embeddings

non-profit

AI & ML interests

Embeddings

Description

Our goal was to create a Proof of Concept (PoC) solution for matching messages from Telegram marketplaces.

There are two models that we developed:

  • RoSBERTa-hermes-ru: Trained for location recognition, categories labeling, and inside-outside location classification.
  • rubert-tiny-separater: Trained for supply and demand classification.

Architecture and Pretraining

RoSBERTa-hermes-ru

RoSBERTa is based on ai-forever/ru-en-RoSBERTa with multiple heads for downstream tasks:

  • Backbone: Fully unfrozen, with the NER head fine-tuned for location recognition.
  • Allocator head: Trained to determine whether or not a message contains the actual location of the user.
  • Tags head with 1 layer of adapter: Trained to mark messages with different categories describing the message's context, such as tools, medicine, clothing, and more.

rubert-tiny-separater

Rubert is based on sergeyzh/rubert-tiny-turbo with a linear layer on top. The whole model was trained for classifying message types from Telegram marketplaces.

Labels:

  • Supply: Somebody willing to sell something or provide a service.
  • Demand: Somebody wants to buy something or hire someone.
  • Noise: Messages unrelated to the topic.

Supported Languages

Russian, with English included.

models

None public yet

datasets

None public yet