OpenNMT
/

Mistral-7B-v0.2-instruct-onmt-awq-gemv

Model card Files Files and versions Community

vince62s commited on Dec 20, 2023

Commit

5cdc95b

·

1 Parent(s): d3b6ea2

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -23,14 +23,23 @@ Boston is a great city with many attractions to visit. Here are some popular one
 If you run with a batch size of 60 you can get a nice throughput even with GEMV:
 [2023-12-20 08:27:03,293 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
 [2023-12-20 08:27:03,394 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
 [2023-12-20 08:27:08,346 INFO] Loading data into the model
 step0 time:  1.3734617233276367
 [2023-12-20 08:27:28,197 INFO] PRED SCORE: -0.2994, PRED PPL: 1.35 NB SENTENCES: 59
 [2023-12-20 08:27:28,197 INFO] Total translation time (s): 6.4
 [2023-12-20 08:27:28,197 INFO] Average translation time (ms): 109.1
 [2023-12-20 08:27:28,197 INFO] Tokens per second: 1835.8
 Time w/o python interpreter load/terminate:  24.914613008499146

 If you run with a batch size of 60 you can get a nice throughput even with GEMV:
 [2023-12-20 08:27:03,293 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
 [2023-12-20 08:27:03,394 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
 [2023-12-20 08:27:08,346 INFO] Loading data into the model
 step0 time:  1.3734617233276367
 [2023-12-20 08:27:28,197 INFO] PRED SCORE: -0.2994, PRED PPL: 1.35 NB SENTENCES: 59
 [2023-12-20 08:27:28,197 INFO] Total translation time (s): 6.4
 [2023-12-20 08:27:28,197 INFO] Average translation time (ms): 109.1
 [2023-12-20 08:27:28,197 INFO] Tokens per second: 1835.8
 Time w/o python interpreter load/terminate:  24.914613008499146