Spaces:
Running
Running
| from pathlib import Path | |
| banner_url = "https://huggingface.co/spaces/WildEval/WildBench-Leaderboard/resolve/main/%E2%80%8Eleaderboard_logo_v2.png" # the same repo here. | |
| BANNER = f'<div style="display: flex; justify-content: space-around;"><img src="{banner_url}" alt="Banner" style="width: 40vw; min-width: 300px; max-width: 600px;"> </div>' | |
| INTRODUCTION_TEXT= """ | |
| # OS Benchmark (Evaluating LLMs with OS and MCQ) | |
| π [Website](https://github.com/VILA-Lab/MBZUAI-LLM-Leaderboard) | π» [GitHub](https://github.com/VILA-Lab/MBZUAI-LLM-Leaderboard) | π [Paper](#) | π¦ [Tweet 1](#) | π¦ [Tweet 2](#) | |
| > ### Open-LLM-Leaderboard,for evaluating large language models (LLMs) by transitioning from multiple-choice questions (MCQs) to open-style questions. | |
| This approach addresses the inherent biases and limitations of MCQs, such as selection bias and the effect of random guessing. By utilizing open-style questions, | |
| the framework aims to provide a more accurate assessment of LLMs' abilities across various benchmarks and ensure that the evaluation reflects true capabilities, | |
| particularly in terms of language understanding and reasoning. | |
| """ | |
| CITATION_TEXT = """@artical{.., | |
| title={Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena}, | |
| author={}, | |
| year={2024}, | |
| archivePrefix={arXiv} | |
| } | |
| """ |