hoang-quoc-trung commited on
Commit
0be9fff
·
verified ·
1 Parent(s): b943ee4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -1,3 +1,70 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-to-text
4
  ---
5
+
6
+ # <font color="turquoise"> <p style="text-align:center"> Translating Math Formula Images To LaTeX Sequences </p> </font>
7
+
8
+
9
+ Scaling Up Image-to-LaTeX Performance: Sumen An End-to-End Transformer Model With Large Dataset
10
+
11
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/639ca4299e1c02384ee5d753/Rh6_Pu3wE9y3cILsl5BLb.png)
12
+
13
+ ## Performance
14
+
15
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/639ca4299e1c02384ee5d753/lm56bL2NCX-ZdbmIjCzWO.png)
16
+
17
+
18
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/639ca4299e1c02384ee5d753/PRcJhuPmFEPbmOPSS1ZIt.png)
19
+
20
+ ## Uses
21
+
22
+ #### Source code: https://github.com/hoang-quoc-trung/sumen
23
+
24
+ #### Inference
25
+
26
+ ```python
27
+ import torch
28
+ import requests
29
+ from PIL import Image
30
+ from transformers import AutoProcessor, VisionEncoderDecoderModel
31
+
32
+ # Load model & processor
33
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
34
+ model = VisionEncoderDecoderModel.from_pretrained('hoang-quoc-trung/sumen-base').to(device)
35
+ processor = AutoProcessor.from_pretrained('hoang-quoc-trung/sumen-base')
36
+ task_prompt = processor.tokenizer.bos_token
37
+ decoder_input_ids = processor.tokenizer(
38
+ task_prompt,
39
+ add_special_tokens=False,
40
+ return_tensors="pt"
41
+ ).input_ids
42
+ # Load image
43
+ img_url = 'https://raw.githubusercontent.com/hoang-quoc-trung/sumen/main/assets/example_1.png'
44
+ image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
45
+ pixel_values = processor.image_processor(
46
+ image,
47
+ return_tensors="pt",
48
+ data_format="channels_first",
49
+ ).pixel_values
50
+ # Generate LaTeX expression
51
+ with torch.no_grad():
52
+ outputs = model.generate(
53
+ pixel_values.to(device),
54
+ decoder_input_ids=decoder_input_ids.to(device),
55
+ max_length=model.decoder.config.max_length,
56
+ pad_token_id=processor.tokenizer.pad_token_id,
57
+ eos_token_id=processor.tokenizer.eos_token_id,
58
+ use_cache=True,
59
+ num_beams=4,
60
+ bad_words_ids=[[processor.tokenizer.unk_token_id]],
61
+ return_dict_in_generate=True,
62
+ )
63
+ sequence = processor.tokenizer.batch_decode(outputs.sequences)[0]
64
+ sequence = sequence.replace(
65
+ processor.tokenizer.eos_token, ""
66
+ ).replace(
67
+ processor.tokenizer.pad_token, ""
68
+ ).replace(processor.tokenizer.bos_token,"")
69
+ print(sequence)
70
+ ```