File size: 2,316 Bytes
b29fa9c
c1034a3
23f713b
3156693
23f713b
3156693
23f713b
b29fa9c
23f713b
280d7c9
23f713b
8b3ff1d
23f713b
 
 
280d7c9
23f713b
 
51abd68
 
 
b82097b
51abd68
 
 
23f713b
51abd68
 
 
 
 
28402a5
51abd68
8a77742
 
 
140a8c4
 
 
 
 
 
 
23f713b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a77742
d5ee919
8a77742
 
 
 
 
 
 
d5ee919
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
pipeline_tag: sentence-similarity
tags:
- formula-transformers
- feature-extraction
- formula-similarity

---

# CLFE(ConMath)

This is a formula embedding model trained on Latex, Presentation MathML and Content MathML of formulas: It maps formulas to a 768 dimensional dense vector space. It was introduced in https://link.springer.com/chapter/10.1007/978-981-99-7254-8_8

<!--- Describe your model here -->

## Usage


```
pip install -U sentence-transformers
```
Put 'MarkuplmTransformerForConMATH.py' into 'sentence_transfomers/models', and add 'from .MarkuplmTransformerForConMATH import MarkuplmTransformerForConMATH' into 'sentence_transfomers/models/\_init\_'

Then you can use the model like this:

```python
from sentence_transformers import SentenceTransformer
latex = r"13\times x"
pmml = r"<math><semantics><mrow><mn>13</mn><mo>×</mo><mi>x</mi></mrow></semantics></math>"
cmml = r"<math><apply><times></times><cn>13</cn><ci>x</ci></apply></math>"

model = SentenceTransformer('Jyiyiyiyi/CLFE_ConMath')

embedding_latex = model.encode([{'latex': latex}])
embedding_pmml = model.encode([{'mathml': pmml}])
embedding_cmml = model.encode([{'mathml': cmml}])

print('latex embedding:')
print(embedding_latex)
print('Presentation MathML embedding:')
print(embedding_pmml)
print('Content MathML embedding:')
print(embedding_cmml)
```



## Full Model Architecture
```
SentenceTransformer(
  (0): Asym(
    (latex-0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
    (mathml-0): MarkuplmTransformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MarkupLMModel 
  )
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)
```

## Citing & Authors

<!--- Describe where people can find more information -->
```
@inproceedings{wang2023math,
  title={Math Information Retrieval with Contrastive Learning of Formula Embeddings},
  author={Wang, Jingyi and Tian, Xuedong},
  booktitle={International Conference on Web Information Systems Engineering},
  pages={97--107},
  year={2023},
  organization={Springer}
}
```