Jyiyiyiyi
/

CLFE_ConMath

Sentence Similarity

formula-transformers

feature-extraction

formula-similarity

Model card Files Files and versions Community

CLFE_ConMath / README.md

Jyiyiyiyi's picture

Update README.md

8b3ff1d about 1 year ago

|

history blame contribute delete

2.32 kB

	---
	pipeline_tag: sentence-similarity
	tags:
	- formula-transformers
	- feature-extraction
	- formula-similarity

	---

	# CLFE(ConMath)

	This is a formula embedding model trained on Latex, Presentation MathML and Content MathML of formulas: It maps formulas to a 768 dimensional dense vector space. It was introduced in https://link.springer.com/chapter/10.1007/978-981-99-7254-8_8

	<!--- Describe your model here -->

	## Usage


	```
	pip install -U sentence-transformers
	```
	Put 'MarkuplmTransformerForConMATH.py' into 'sentence_transfomers/models', and add 'from .MarkuplmTransformerForConMATH import MarkuplmTransformerForConMATH' into 'sentence_transfomers/models/\_init\_'

	Then you can use the model like this:

	```python
	from sentence_transformers import SentenceTransformer
	latex = r"13\times x"
	pmml = r"<math><semantics><mrow><mn>13</mn><mo>×</mo><mi>x</mi></mrow></semantics></math>"
	cmml = r"<math><apply><times></times><cn>13</cn><ci>x</ci></apply></math>"

	model = SentenceTransformer('Jyiyiyiyi/CLFE_ConMath')

	embedding_latex = model.encode([{'latex': latex}])
	embedding_pmml = model.encode([{'mathml': pmml}])
	embedding_cmml = model.encode([{'mathml': cmml}])

	print('latex embedding:')
	print(embedding_latex)
	print('Presentation MathML embedding:')
	print(embedding_pmml)
	print('Content MathML embedding:')
	print(embedding_cmml)
	```



	## Full Model Architecture
	```
	SentenceTransformer(
	(0): Asym(
	(latex-0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
	(mathml-0): MarkuplmTransformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MarkupLMModel
	)
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
	)
	```

	## Citing & Authors

	<!--- Describe where people can find more information -->
	```
	@inproceedings{wang2023math,
	title={Math Information Retrieval with Contrastive Learning of Formula Embeddings},
	author={Wang, Jingyi and Tian, Xuedong},
	booktitle={International Conference on Web Information Systems Engineering},
	pages={97--107},
	year={2023},
	organization={Springer}
	}
	```