Spaces:
Sleeping
Sleeping
Commit
·
da3a772
1
Parent(s):
2ca9f7d
2024-02-06 14:50:54 Publish script update
Browse files
app.py
CHANGED
@@ -29,7 +29,7 @@ To do that we added following text to the query:
|
|
29 |
|
30 |
When measuring execution time we used `time.time()` result saved to variable before making the call to API and compared it to `time.time()` result after receiving the results. We used litellm python library for all of the models which naturally adds some overhead compared to pure curl calls.
|
31 |
|
32 |
-
In order to count tokens we split the output string by whitespace
|
33 |
|
34 |
When it comes to pricing most providers charge per token count, while HuggingFace Endpoints allow the user to choose machine type and host the model repository on it. The user is then charged by the running time of the machine. In this project we attempted to use HF Endpoints as much as possible due to their popularity and transparency of how the model is executed.
|
35 |
"""
|
@@ -163,13 +163,21 @@ with gr.Blocks() as demo:
|
|
163 |
with gr.Row():
|
164 |
collapse_languages_button.render()
|
165 |
collapse_output_method_button.render()
|
166 |
-
summary_ui = gr.DataFrame(dataframe_style(summary_df), label="
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
167 |
|
168 |
with gr.Tab("Preformance by time of the day"):
|
169 |
time_of_day_comparison_ui = gr.DataFrame(dataframe_style(time_of_day_comparison_df), label="Time of day")
|
170 |
time_of_day_plot_ui = gr.Plot(time_of_day_plot, label="Time of the day plot", scale=1)
|
171 |
time_periods_explanation_ui = gr.DataFrame(dataframe_style(time_periods_explanation_df), label="Times of day ranges")
|
172 |
-
gr.Markdown("""
|
173 |
These measurements were made by testing the models using the same dataset as in the other comparisons every hour for 24 hours.
|
174 |
|
175 |
Execution time refers to averaged time needed to execute one query.
|
@@ -189,8 +197,7 @@ Hugging Face Inference Endpoints are charged by hour so to compare different pro
|
|
189 |
for models hosted this way we calculated "Cost Per Token" column using data collected during the experiment.
|
190 |
|
191 |
Note that pause and resume time cost was not included in the "Cost Per Token" column calculation.
|
192 |
-
"""
|
193 |
-
)
|
194 |
filter_button.click(
|
195 |
fn=filter_dataframes,
|
196 |
inputs=filter_textbox,
|
|
|
29 |
|
30 |
When measuring execution time we used `time.time()` result saved to variable before making the call to API and compared it to `time.time()` result after receiving the results. We used litellm python library for all of the models which naturally adds some overhead compared to pure curl calls.
|
31 |
|
32 |
+
In order to count tokens we split the output string by whitespace `\w` regex character. For data which was impossible to obtain through the API, such as model sizes we only used official sources such as developers' release blogs and their documentation.
|
33 |
|
34 |
When it comes to pricing most providers charge per token count, while HuggingFace Endpoints allow the user to choose machine type and host the model repository on it. The user is then charged by the running time of the machine. In this project we attempted to use HF Endpoints as much as possible due to their popularity and transparency of how the model is executed.
|
35 |
"""
|
|
|
163 |
with gr.Row():
|
164 |
collapse_languages_button.render()
|
165 |
collapse_output_method_button.render()
|
166 |
+
summary_ui = gr.DataFrame(dataframe_style(summary_df), label="Output characteristics")
|
167 |
+
gr.Markdown("""\
|
168 |
+
This table compares output characteristics of different models which include execution time, output size and chunking of the output. Some providers and models don't support output chunking, in this case chunk related fields are left empty.
|
169 |
+
|
170 |
+
Execution time refers to averaged time needed to execute one query.
|
171 |
+
|
172 |
+
To count words we split the output string by whitespace `\w` regex character.
|
173 |
+
|
174 |
+
Chunk sizes are measured in the characters count.""")
|
175 |
|
176 |
with gr.Tab("Preformance by time of the day"):
|
177 |
time_of_day_comparison_ui = gr.DataFrame(dataframe_style(time_of_day_comparison_df), label="Time of day")
|
178 |
time_of_day_plot_ui = gr.Plot(time_of_day_plot, label="Time of the day plot", scale=1)
|
179 |
time_periods_explanation_ui = gr.DataFrame(dataframe_style(time_periods_explanation_df), label="Times of day ranges")
|
180 |
+
gr.Markdown("""\
|
181 |
These measurements were made by testing the models using the same dataset as in the other comparisons every hour for 24 hours.
|
182 |
|
183 |
Execution time refers to averaged time needed to execute one query.
|
|
|
197 |
for models hosted this way we calculated "Cost Per Token" column using data collected during the experiment.
|
198 |
|
199 |
Note that pause and resume time cost was not included in the "Cost Per Token" column calculation.
|
200 |
+
""")
|
|
|
201 |
filter_button.click(
|
202 |
fn=filter_dataframes,
|
203 |
inputs=filter_textbox,
|