File size: 2,532 Bytes
18dc7b6
 
0be9fff
9b8b16a
 
 
 
 
 
 
18dc7b6
0be9fff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b8b16a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: apache-2.0
pipeline_tag: image-to-text
datasets:
- hoang-quoc-trung/fusion-image-to-latex-datasets
tags:
- img2latex
- latex ocr
- Printed Mathematical Expression Recognition
- Handwritten Mathematical Expression Recognition
---

# <font color="turquoise"> <p style="text-align:center"> Translating Math Formula Images To LaTeX Sequences </p> </font>


Scaling Up Image-to-LaTeX Performance: Sumen An End-to-End Transformer Model With Large Dataset

![image/png](https://cdn-uploads.huggingface.co/production/uploads/639ca4299e1c02384ee5d753/Rh6_Pu3wE9y3cILsl5BLb.png)

## Performance

![image/png](https://cdn-uploads.huggingface.co/production/uploads/639ca4299e1c02384ee5d753/lm56bL2NCX-ZdbmIjCzWO.png)


![image/png](https://cdn-uploads.huggingface.co/production/uploads/639ca4299e1c02384ee5d753/PRcJhuPmFEPbmOPSS1ZIt.png)

## Uses

#### Source code: https://github.com/hoang-quoc-trung/sumen

#### Inference

```python
import torch
import requests
from PIL import Image
from transformers import AutoProcessor, VisionEncoderDecoderModel

# Load model & processor
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VisionEncoderDecoderModel.from_pretrained('hoang-quoc-trung/sumen-base').to(device)
processor = AutoProcessor.from_pretrained('hoang-quoc-trung/sumen-base')
task_prompt = processor.tokenizer.bos_token
decoder_input_ids = processor.tokenizer(
    task_prompt,
    add_special_tokens=False,
    return_tensors="pt"
).input_ids
# Load image
img_url = 'https://raw.githubusercontent.com/hoang-quoc-trung/sumen/main/assets/example_1.png'
image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
pixel_values = processor.image_processor(
    image,
    return_tensors="pt",
    data_format="channels_first",
).pixel_values
# Generate LaTeX expression
with torch.no_grad():
    outputs = model.generate(
        pixel_values.to(device),
        decoder_input_ids=decoder_input_ids.to(device),
        max_length=model.decoder.config.max_length,
        pad_token_id=processor.tokenizer.pad_token_id,
        eos_token_id=processor.tokenizer.eos_token_id,
        use_cache=True,
        num_beams=4,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )
sequence = processor.tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(
        processor.tokenizer.eos_token, ""
    ).replace(
        processor.tokenizer.pad_token, ""
    ).replace(processor.tokenizer.bos_token,"")
print(sequence)
```