Spaces:

OALL
/

Open-Arabic-LLM-Leaderboard

Running on CPU Upgrade

How to update the v1&v2 leaderboard？

#18

by StarscreamDeceptions - opened 23 days ago

23 days ago

There are actually two issues here. First, I have updated my model in the original repository, but since I cannot modify the repository name, I am unable to submit to the v2 leaderboard. Second, I noticed some discrepancies between my local evaluation results and the online results, so I would like to request your help in updating the evaluation results on the v1 leaderboard. Please contact me as soon as possible. Thank you, and have a great day!

amztheory

Open Arabic LLM Leaderboard org 22 days ago

Hi @StarscreamDeceptions
regarding the 1st issue, you can submit the model through the submission page. We don't require models submitted to v2 to be part v1 as the leaderboard is always open for new released models!
Can you kindly elaborate further on the issue? However, we suggest you start by submitting the model to v2, as v1 is now on archive status!

StarscreamDeceptions

22 days ago

•

edited 22 days ago

Hi，Bro @amztheory
Regarding the first issue, I have modified the model name in my repository and submitted it, which is great. As for the second issue, due to my oversight, I failed to submit my model to the v1 leaderboard in time. I realize that the v1 leaderboard is now archived. Would it be possible for you to help evaluate my model and update its results on the v1 leaderboard? This is crucial for my project progress as I need to obtain accurate results of my trained model on the v1 leaderboard. I sincerely hope you can help me update the v1 leaderboard. Thank you again for your assistance.

amztheory

Open Arabic LLM Leaderboard org 19 days ago

Hi @StarscreamDeceptions
We thank you for showing interest in v1 benchmarks, however, we no longer support evaluating models on v1 benchmarks. It's worth noting that benchmarks in v2 should be a better representative of your models performance in Arabic tasks.
judging by the date of your response, I believe you are the one who has submitted AIDC-AI/Marco-LLM-AR-V2. As you might have noticed the model failed to get evaluated, which is due to not being able to load the model. The issues seem to stem from the fact that your model.safetensors.index.json is missing 'metadata' property.

let me know once you have these issues fixed, and I will launch the eval again.

StarscreamDeceptions

18 days ago

Hi， @amztheory
I have fixed these issues，thanks for your help～

StarscreamDeceptions

16 days ago

•

edited 16 days ago

Hi @amztheory
I am wondering know how you guys eval the model，Why is there such a big gap between online and offline results?

It's abnormal.

StarscreamDeceptions

15 days ago

@amztheory Hope to get your reply soon，thank you

amztheory

Open Arabic LLM Leaderboard org 15 days ago

Hi @StarscreamDeceptions
scores above are for AIDC-AI/Marco-LLM-AR-V2 correct?
I will look into it! Can you just confirm that you have reproduced the scores with --use-chat-template enabled since you requested using the chat template upon submission

StarscreamDeceptions

14 days ago

Hi @amztheory
Thanks for your response, It seems that chat template cause this problem happended, Can you remove our results and let me resubmit them?

StarscreamDeceptions

14 days ago

Hi @amztheory
Looking forward to your reply！

amztheory

Open Arabic LLM Leaderboard org 12 days ago

@StarscreamDeceptions
I'll relaunch the eval on your model with chat_template being disabled. Current scores will be removed.

StarscreamDeceptions

12 days ago

@amztheory
thank you and have a nice day！

amztheory

Open Arabic LLM Leaderboard org 12 days ago

@StarscreamDeceptions
the leaderboard should now reflect the updated scores for your model!
In future submissions, please ensure you select chat_template only if desired.
keep up the good work!

amztheory changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment