piotr-szleg-bards-ai commited on
Commit
da3a772
·
1 Parent(s): 2ca9f7d

2024-02-06 14:50:54 Publish script update

Browse files
Files changed (1) hide show
  1. app.py +12 -5
app.py CHANGED
@@ -29,7 +29,7 @@ To do that we added following text to the query:
29
 
30
  When measuring execution time we used `time.time()` result saved to variable before making the call to API and compared it to `time.time()` result after receiving the results. We used litellm python library for all of the models which naturally adds some overhead compared to pure curl calls.
31
 
32
- In order to count tokens we split the output string by whitespace \w regex character. For data which was impossible to obtain through the API, such as model sizes we only used official sources such as developers' release blogs and their documentation.
33
 
34
  When it comes to pricing most providers charge per token count, while HuggingFace Endpoints allow the user to choose machine type and host the model repository on it. The user is then charged by the running time of the machine. In this project we attempted to use HF Endpoints as much as possible due to their popularity and transparency of how the model is executed.
35
  """
@@ -163,13 +163,21 @@ with gr.Blocks() as demo:
163
  with gr.Row():
164
  collapse_languages_button.render()
165
  collapse_output_method_button.render()
166
- summary_ui = gr.DataFrame(dataframe_style(summary_df), label="Statistics")
 
 
 
 
 
 
 
 
167
 
168
  with gr.Tab("Preformance by time of the day"):
169
  time_of_day_comparison_ui = gr.DataFrame(dataframe_style(time_of_day_comparison_df), label="Time of day")
170
  time_of_day_plot_ui = gr.Plot(time_of_day_plot, label="Time of the day plot", scale=1)
171
  time_periods_explanation_ui = gr.DataFrame(dataframe_style(time_periods_explanation_df), label="Times of day ranges")
172
- gr.Markdown("""
173
  These measurements were made by testing the models using the same dataset as in the other comparisons every hour for 24 hours.
174
 
175
  Execution time refers to averaged time needed to execute one query.
@@ -189,8 +197,7 @@ Hugging Face Inference Endpoints are charged by hour so to compare different pro
189
  for models hosted this way we calculated "Cost Per Token" column using data collected during the experiment.
190
 
191
  Note that pause and resume time cost was not included in the "Cost Per Token" column calculation.
192
- """
193
- )
194
  filter_button.click(
195
  fn=filter_dataframes,
196
  inputs=filter_textbox,
 
29
 
30
  When measuring execution time we used `time.time()` result saved to variable before making the call to API and compared it to `time.time()` result after receiving the results. We used litellm python library for all of the models which naturally adds some overhead compared to pure curl calls.
31
 
32
+ In order to count tokens we split the output string by whitespace `\w` regex character. For data which was impossible to obtain through the API, such as model sizes we only used official sources such as developers' release blogs and their documentation.
33
 
34
  When it comes to pricing most providers charge per token count, while HuggingFace Endpoints allow the user to choose machine type and host the model repository on it. The user is then charged by the running time of the machine. In this project we attempted to use HF Endpoints as much as possible due to their popularity and transparency of how the model is executed.
35
  """
 
163
  with gr.Row():
164
  collapse_languages_button.render()
165
  collapse_output_method_button.render()
166
+ summary_ui = gr.DataFrame(dataframe_style(summary_df), label="Output characteristics")
167
+ gr.Markdown("""\
168
+ This table compares output characteristics of different models which include execution time, output size and chunking of the output. Some providers and models don't support output chunking, in this case chunk related fields are left empty.
169
+
170
+ Execution time refers to averaged time needed to execute one query.
171
+
172
+ To count words we split the output string by whitespace `\w` regex character.
173
+
174
+ Chunk sizes are measured in the characters count.""")
175
 
176
  with gr.Tab("Preformance by time of the day"):
177
  time_of_day_comparison_ui = gr.DataFrame(dataframe_style(time_of_day_comparison_df), label="Time of day")
178
  time_of_day_plot_ui = gr.Plot(time_of_day_plot, label="Time of the day plot", scale=1)
179
  time_periods_explanation_ui = gr.DataFrame(dataframe_style(time_periods_explanation_df), label="Times of day ranges")
180
+ gr.Markdown("""\
181
  These measurements were made by testing the models using the same dataset as in the other comparisons every hour for 24 hours.
182
 
183
  Execution time refers to averaged time needed to execute one query.
 
197
  for models hosted this way we calculated "Cost Per Token" column using data collected during the experiment.
198
 
199
  Note that pause and resume time cost was not included in the "Cost Per Token" column calculation.
200
+ """)
 
201
  filter_button.click(
202
  fn=filter_dataframes,
203
  inputs=filter_textbox,