How to update the v1&v2 leaderboard?

#18
by StarscreamDeceptions - opened

There are actually two issues here. First, I have updated my model in the original repository, but since I cannot modify the repository name, I am unable to submit to the v2 leaderboard. Second, I noticed some discrepancies between my local evaluation results and the online results, so I would like to request your help in updating the evaluation results on the v1 leaderboard. Please contact me as soon as possible. Thank you, and have a great day!

Open Arabic LLM Leaderboard org

Hi @StarscreamDeceptions
regarding the 1st issue, you can submit the model through the submission page. We don't require models submitted to v2 to be part v1 as the leaderboard is always open for new released models!
Can you kindly elaborate further on the issue? However, we suggest you start by submitting the model to v2, as v1 is now on archive status!

Hi,Bro @amztheory
Regarding the first issue, I have modified the model name in my repository and submitted it, which is great. As for the second issue, due to my oversight, I failed to submit my model to the v1 leaderboard in time. I realize that the v1 leaderboard is now archived. Would it be possible for you to help evaluate my model and update its results on the v1 leaderboard? This is crucial for my project progress as I need to obtain accurate results of my trained model on the v1 leaderboard. I sincerely hope you can help me update the v1 leaderboard. Thank you again for your assistance.

Open Arabic LLM Leaderboard org

Hi @StarscreamDeceptions
We thank you for showing interest in v1 benchmarks, however, we no longer support evaluating models on v1 benchmarks. It's worth noting that benchmarks in v2 should be a better representative of your models performance in Arabic tasks.
judging by the date of your response, I believe you are the one who has submitted AIDC-AI/Marco-LLM-AR-V2. As you might have noticed the model failed to get evaluated, which is due to not being able to load the model. The issues seem to stem from the fact that your model.safetensors.index.json is missing 'metadata' property.

let me know once you have these issues fixed, and I will launch the eval again.

Hi, @amztheory
I have fixed these issues,thanks for your help~

Hi @amztheory
I am wondering know how you guys eval the model,Why is there such a big gap between online and offline results?

image.png

image.png
It's abnormal.

@amztheory Hope to get your reply soon,thank you

Open Arabic LLM Leaderboard org

Hi @StarscreamDeceptions
scores above are for AIDC-AI/Marco-LLM-AR-V2 correct?
I will look into it! Can you just confirm that you have reproduced the scores with --use-chat-template enabled since you requested using the chat template upon submission

Hi @amztheory
Thanks for your response, It seems that chat template cause this problem happended, Can you remove our results and let me resubmit them?

Hi @amztheory
Looking forward to your reply!

Open Arabic LLM Leaderboard org

@StarscreamDeceptions
I'll relaunch the eval on your model with chat_template being disabled. Current scores will be removed.

@amztheory
thank you and have a nice day!

Open Arabic LLM Leaderboard org

@StarscreamDeceptions
the leaderboard should now reflect the updated scores for your model!
In future submissions, please ensure you select chat_template only if desired.
keep up the good work!

amztheory changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment