File size: 2,316 Bytes
b29fa9c c1034a3 23f713b 3156693 23f713b 3156693 23f713b b29fa9c 23f713b 280d7c9 23f713b 8b3ff1d 23f713b 280d7c9 23f713b 51abd68 b82097b 51abd68 23f713b 51abd68 28402a5 51abd68 8a77742 140a8c4 23f713b 8a77742 d5ee919 8a77742 d5ee919 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
pipeline_tag: sentence-similarity
tags:
- formula-transformers
- feature-extraction
- formula-similarity
---
# CLFE(ConMath)
This is a formula embedding model trained on Latex, Presentation MathML and Content MathML of formulas: It maps formulas to a 768 dimensional dense vector space. It was introduced in https://link.springer.com/chapter/10.1007/978-981-99-7254-8_8
<!--- Describe your model here -->
## Usage
```
pip install -U sentence-transformers
```
Put 'MarkuplmTransformerForConMATH.py' into 'sentence_transfomers/models', and add 'from .MarkuplmTransformerForConMATH import MarkuplmTransformerForConMATH' into 'sentence_transfomers/models/\_init\_'
Then you can use the model like this:
```python
from sentence_transformers import SentenceTransformer
latex = r"13\times x"
pmml = r"<math><semantics><mrow><mn>13</mn><mo>×</mo><mi>x</mi></mrow></semantics></math>"
cmml = r"<math><apply><times></times><cn>13</cn><ci>x</ci></apply></math>"
model = SentenceTransformer('Jyiyiyiyi/CLFE_ConMath')
embedding_latex = model.encode([{'latex': latex}])
embedding_pmml = model.encode([{'mathml': pmml}])
embedding_cmml = model.encode([{'mathml': cmml}])
print('latex embedding:')
print(embedding_latex)
print('Presentation MathML embedding:')
print(embedding_pmml)
print('Content MathML embedding:')
print(embedding_cmml)
```
## Full Model Architecture
```
SentenceTransformer(
(0): Asym(
(latex-0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(mathml-0): MarkuplmTransformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MarkupLMModel
)
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)
```
## Citing & Authors
<!--- Describe where people can find more information -->
```
@inproceedings{wang2023math,
title={Math Information Retrieval with Contrastive Learning of Formula Embeddings},
author={Wang, Jingyi and Tian, Xuedong},
booktitle={International Conference on Web Information Systems Engineering},
pages={97--107},
year={2023},
organization={Springer}
}
``` |