|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
language: |
|
- de |
|
base_model: |
|
- deepset/gbert-base |
|
--- |
|
|
|
# CareerBERT-G |
|
|
|
A Sentence Transformer Model finetuned on [ESCO Taxonomy](https://esco.ec.europa.eu/classification/occupation_main). |
|
Basemodel: deepset/gbert-base |
|
|
|
Corresponding Paper: https://www.sciencedirect.com/science/article/pii/S0957417425006657 |
|
|
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can use the model like this: |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
sentences = ["This is an example sentence", "Each sentence is converted"] |
|
|
|
model = SentenceTransformer('{MODEL_NAME}') |
|
embeddings = model.encode(sentences) |
|
print(embeddings) |
|
``` |
|
|
|
|
|
|
|
## Usage (HuggingFace Transformers) |
|
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings. |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModel |
|
import torch |
|
|
|
|
|
#Mean Pooling - Take attention mask into account for correct averaging |
|
def mean_pooling(model_output, attention_mask): |
|
token_embeddings = model_output[0] #First element of model_output contains all token embeddings |
|
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() |
|
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) |
|
|
|
|
|
# Sentences we want sentence embeddings for |
|
sentences = ['This is an example sentence', 'Each sentence is converted'] |
|
|
|
# Load model from HuggingFace Hub |
|
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}') |
|
model = AutoModel.from_pretrained('{MODEL_NAME}') |
|
|
|
# Tokenize sentences |
|
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') |
|
|
|
# Compute token embeddings |
|
with torch.no_grad(): |
|
model_output = model(**encoded_input) |
|
|
|
# Perform pooling. In this case, mean pooling. |
|
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']) |
|
|
|
print("Sentence embeddings:") |
|
print(sentence_embeddings) |
|
``` |
|
|
|
|
|
|
|
## Evaluation Results |
|
|
|
<!--- Describe how your model was evaluated --> |
|
|
|
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME}) |
|
|
|
|
|
## Training |
|
The model was trained with the parameters: |
|
|
|
**DataLoader**: |
|
|
|
`torch.utils.data.dataloader.DataLoader` of length 3695 with parameters: |
|
``` |
|
{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'} |
|
``` |
|
|
|
**Loss**: |
|
|
|
`sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters: |
|
``` |
|
{'scale': 20.0, 'similarity_fct': 'cos_sim'} |
|
``` |
|
|
|
Parameters of the fit()-Method: |
|
``` |
|
{ |
|
"epochs": 1, |
|
"evaluation_steps": 0, |
|
"evaluator": "sentence_transformers.evaluation.RerankingEvaluator.RerankingEvaluator", |
|
"max_grad_norm": 1, |
|
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>", |
|
"optimizer_params": { |
|
"lr": 2e-05 |
|
}, |
|
"scheduler": "WarmupLinear", |
|
"steps_per_epoch": null, |
|
"warmup_steps": 11821.1, |
|
"weight_decay": 0.01 |
|
} |
|
``` |
|
|
|
|
|
## Full Model Architecture |
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) |
|
) |
|
``` |
|
|
|
## Citing & Authors |
|
|
|
@article{ROSENBERGER2025127043, |
|
title = {CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations}, |
|
journal = {Expert Systems with Applications}, |
|
volume = {275}, |
|
pages = {127043}, |
|
year = {2025}, |
|
issn = {0957-4174}, |
|
doi = {https://doi.org/10.1016/j.eswa.2025.127043}, |
|
url = {https://www.sciencedirect.com/science/article/pii/S0957417425006657}, |
|
author = {Julian Rosenberger and Lukas Wolfrum and Sven Weinzierl and Mathias Kraus and Patrick Zschech}, |
|
keywords = {Job consultation, Job markets, Job recommendation system, BERT, NLP}, |
|
abstract = {The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT’s performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.} |
|
} |