Spaces:
Running
Running
Update src/about.py
Browse files- src/about.py +16 -27
src/about.py
CHANGED
@@ -54,33 +54,22 @@ we require all submitters to provide either a link to a paper/blog/report that i
|
|
54 |
"""
|
55 |
|
56 |
EVALUATION_QUEUE_TEXT = """
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
### 3) Make sure your model has an open license!
|
75 |
-
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗
|
76 |
-
|
77 |
-
### 4) Fill up your model card
|
78 |
-
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
|
79 |
-
|
80 |
-
## In case of model failure
|
81 |
-
If your model is displayed in the `FAILED` category, its execution stopped.
|
82 |
-
Make sure you have followed the above steps first.
|
83 |
-
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
|
84 |
"""
|
85 |
|
86 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
|
|
54 |
"""
|
55 |
|
56 |
EVALUATION_QUEUE_TEXT = """
|
57 |
+
# Instructions to submit results
|
58 |
+
|
59 |
+
- First, make sure you've read the content in About part.
|
60 |
+
- Test you model locally and submit your results in the following form.
|
61 |
+
- Upload **one** result each time by fulfill the form and click "Upload One Eval", and you will be able to see the result in the "Uploaded results" part.
|
62 |
+
- Continue to upload untill all results are uploaded, click "Submit All", after restarting the space, you will be able to see your result on the leaderboard, but marked as checking.
|
63 |
+
- If your uploaded results contain error, click "Click Upload" and re-upload all results
|
64 |
+
- If there is an error in submitted results, you can upload an alternative, we will use the latest submitted results during our review.
|
65 |
+
- If there is an error in "checked" results, email us to withdraw.
|
66 |
+
|
67 |
+
# Detailed settings
|
68 |
+
|
69 |
+
**Score**: float number, the corresponding evaluation number
|
70 |
+
**Name**: str **less than 3 words**, an abbreviation representing your work, it can be a model name or paper key words.
|
71 |
+
**BaseModel**: str, LMM model for agent, suggested to be the unique hf model id
|
72 |
+
**Target-research**: (1)`Model-Eval-Online` and `Model-Eval-Global` represent the standard setting proposed in our paper, this setting is used to test the model capability. (2) `Agent-Eval-Prompt`: Any agent design that use fixed model weight, including using RAG, memory and etc. (3) `Agent-Eval-Finetune`: The model weight is changed, and it is trained on in-domain (same environment) data.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
"""
|
74 |
|
75 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|