schmidt-sebastian commited on
Commit
e2853d5
·
verified ·
1 Parent(s): 32d0d16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -3
README.md CHANGED
@@ -1,3 +1,75 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ base_model: google/Gemma-3-27B-IT
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - chat
7
+ extra_gated_heading: Access Gemma3-27B-IT on Hugging Face
8
+ extra_gated_prompt: >-
9
+ To access Gemma3-27B-IT on Hugging Face, you are required to review and agree
10
+ to the gemma license. To do this, please ensure you are logged in to
11
+ Hugging Face and click below. Requests are processed immediately.
12
+ extra_gated_button_content: Acknowledge licensed
13
+ ---
14
+
15
+ # litert-community/Gemma3-27B-IT
16
+
17
+ This model provides a few variants of
18
+ [google/Gemma-3-27B-IT](https://huggingface.co/google/Gemma-3-27B-IT) that are ready for
19
+ deployment on Web using the
20
+ [MediaPipe LLM Inference API](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference).
21
+
22
+ ### Web
23
+
24
+ * Build and run our [sample web app](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/js/README.md).
25
+
26
+ To add the model to your web app, please follow the instructions in our [documentation](https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inference/web_js).
27
+
28
+ ## Performance
29
+
30
+ ### Web
31
+
32
+ Note that all benchmark stats are from a MacBook Pro 2024 (Apple M4 Max chip) with 1280 KV cache size, 1024 tokens prefill, and 256 tokens decode, running in Chrome.
33
+
34
+ <table border="1">
35
+ <tr>
36
+ <th></th>
37
+ <th>Precision</th>
38
+ <th>Backend</th>
39
+ <th>Prefill (tokens/sec)</th>
40
+ <th>Decode (tokens/sec)</th>
41
+ <th>Time-to-first-token (sec)</th>
42
+ <th>GPU Memory</th>
43
+ <th>CPU Memory</th>
44
+ <th>Model size</th>
45
+ <th></th>
46
+ </tr>
47
+ <tr>
48
+ <td><p style="text-align: left">F16</p></td>
49
+ <td><p style="text-align: left">int8</p></td>
50
+ <td><p style="text-align: left">GPU</p></td>
51
+ <td><p style="text-align: right">166 tk/s</p></td>
52
+ <td><p style="text-align: right">8 tk/s</p></td>
53
+ <td><p style="text-align: right">15.0 s</p></td>
54
+ <td><p style="text-align: right">26.8 GB</p></td>
55
+ <td><p style="text-align: right">1.5 GB</p></td>
56
+ <td><p style="text-align: right">27.05 GB</p></td>
57
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Gemma3-27B-IT/resolve/main/gemma3-27b-it-int8-web.task">&#128279;</a></p></td>
58
+ </tr>
59
+ <td><p style="text-align: left">F32</p></td>
60
+ <td><p style="text-align: left">int8</p></td>
61
+ <td><p style="text-align: left">GPU</p></td>
62
+ <td><p style="text-align: right">98 tk/s</p></td>
63
+ <td><p style="text-align: right">8 tk/s</p></td>
64
+ <td><p style="text-align: right">15.0 s</p></td>
65
+ <td><p style="text-align: right">27.8 GB</p></td>
66
+ <td><p style="text-align: right">1.5 GB</p></td>
67
+ <td><p style="text-align: right">27.05 GB</p></td>
68
+ <td><p style="text-align: left"><a style="text-decoration: none" href="https://huggingface.co/litert-community/Gemma3-27B-IT/resolve/main/gemma3-27b-it-int8-web.task">&#128279;</a></p></td>
69
+ </tr>
70
+ </table>
71
+
72
+ * Model size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models).
73
+ * int8: quantized model with int8 weights and float activations.
74
+ * GPU memory: measured by "GPU Process" memory for all of Chrome while running. Chrome was measured as using 130-530MB before any model loading took place.
75
+ * CPU memory: measured for the entire tab while running. Tab was measured as using 30-60MB before any model loading took place.